Methods of identifying synthetic transcriptonal and translational regulatory elements, and compositions relating to same

ABSTRACT

Provided are methods of identifying oligonucleotides having transcriptional or translational activity by integrating the oligonucleotide into a eukaryotic cell genome such that the oligonucleotide is operatively linked to an expressible polynucleotide, and detecting a change in expression of the expressible polynucleotide due to the operatively linked oligonucleotide. Also provided are vectors useful for identifying an oligonucleotide having transcriptional or translational regulatory activity according to a method of the invention. In addition, isolated synthetic transcriptional or translational regulatory elements identified according to a method of the invention are provided, as are kits, which contain a vector useful for identifying a transcriptional or translational regulatory element, or an isolated synthetic transcriptional or translational regulatory element or plurality of such elements. Also provided are isolated transcriptional regulatory elements.

[0001] This application claims the benefit of priority under 35 U.S.C.§119(e) of U.S. Serial No. 60/230,956, filed Sep. 7, 2000; U.S. SerialNo. 60/230,852, filed Sep. 7, 2000; U.S. Serial No. 60/207,804, filedMay 30, 2000; U.S. Serial No. 60/186,496, filed Mar. 2, 2000; U.S.Serial No. 60/178,816, filed Jan. 28, 2000; and U.S. Ser. No. ______(attorney docket SCRIP1370), filed Jan. 12, 2001, each of which isincorporated herein by reference.

[0002] This invention was made in part with government support underGrant No. MCB9982574 awarded by the National Science Foundation. Thegovernment has certain rights in this invention.

BACKGROUND OF THE INVENTION

[0003] 1. Field of the Invention

[0004] This invention relates to methods for producing nucleotidesequences having regulatory functions using cellular selection of randomnucleotide sequences, and to the sequences so produced.

[0005] 2. Background Information

[0006] Every eukaryotic gene has a core promoter that resides at theextreme 5′ end of its transcription unit. Most core promoters containcommon recognition sequences such as the TATA box and GC-rich motifs,which allow binding of RNA polymerase, the enzyme required for thesynthesis of messenger RNA on DNA templates. The core promoter isessential for initiation of transcription. However, it alone usuallydoes not contain all the information necessary for the modulatedexpression of a gene in different contexts in the developing or behavingorganism. This contextual information is frequently provided by otherregulatory elements such as enhancers and silencers, which reside in thegene at locations that are proximal to the core promoter either upstreamor downstream from an initiation site of RNA transcription, and can beseveral kilobases away from the core promoter. In addition, the mRNAmolecules transcribed from gene sequences contain translationalregulatory elements, which regulate production of a polypeptide from themRNA. For example, the mRNA can contain an internal ribosome entry site(IRES) sequence, which effects the manner in which ribosomes bind to anmRNA and initiate translation, and does not require interaction of theribosome with the 5′ end of an mRNA transcript. Thus, an IRES elementcan confer an additional level of regulation on gene expression.

[0007] It is not completely understood how combinations of regulatoryelements interact with the core promoter to achieve the remarkablecontextual diversity of gene expression that exists during animaldevelopment and tissue regeneration, as well as the mis-regulationassociated with pathological conditions such as neoplastic disorders.Understanding how this diversity comes about is a major goal of modernbiology, and achievement of this goal would accelerate progress in anumber of areas in cell biology, development, and medicine. Forinstance, synthetic promoters or IRESes that function in a tissuespecific manner, and that are selected as markers of either healthy ordiseased tissues, can be useful in diagnostic or therapeutic procedures,and in drug development. Such applications for these promoters also canextend our understanding of a variety of diseases, thus providing ameans to develop therapeutic interventions.

[0008] Eukaryotic promoters are complex and frequently containcombinations of several transcriptional regulatory elements. These DNAmotifs are recognized by specific proteins (transcription factors) thatbind to the element and regulate transcription of a particular gene.Hundreds of DNA segments that participate in the regulation oftranscription of genes in eukaryotic systems have been characterized.However, these elements and their corresponding transcription factorsgenerally have been analyzed only as individual units, for example, asto how an element and its associated transcription factors regulate theexpression of a particular gene in a specific context. However, therules by which regulatory elements function either by themselves or incombination with other elements in the many genes in which theseelements are found are not well understood.

[0009] An example of this complexity is provided by the specificinteraction of activator protein 1 (AP-1) with the TPA responsive generegulatory element (TRE), which is present in the promoter and enhancerregions of many eukaryotic genes. The TRE is bound by members of the fosand jun families of transcriptional regulatory proteins, which arerecruited in a number of regulatory situations in gene expression,particularly under conditions involving the integration of growth factorsignals. A TRE can be present in a regulatory region of a gene that isexpressed only in the kidney during its differentiation or,alternatively, in a gene that is expressed constitutively by neural cellprecursors. It is not known, however, how the element is selected tofunction in a very specific context in each of these differentenvironments or, for example, whether other elements are involved inmodulating the function of a TRE such as the ability to repress (orpotentiate) activity from the TRE.

[0010] Compared to transcriptional control sequences, little is knownabout translational control sequences. Some IRESes have been identifiedin viruses, and more recently cellular mRNA sequences having IRESactivity have been identified. Unlike transcriptional regulatoryelements, however, small modular elements having translationalregulatory activity, including IRES activity, have not been identified.

[0011] Currently, there is no general systematic framework for analyzingthe anatomy of promoters, enhancers, IRESes and other transcriptionaland translational regulatory elements, and it is unknown how thecombination of several common transcriptional and translational motifspresent in many of these regulatory elements function cooperatively tocreate unique patterns of gene expression. For example, particularvariations of nucleotides within a regulatory element may be able tofunction well in the context of a specific companion element, whileother variants of the motif may be able to override the influences ofneighboring elements. Thus, a need exists for methods to identifyfunctional transcriptional and translation regulatory elements. Thepresent invention satisfies this need and provides additionaladvantages.

SUMMARY OF THE INVENTION

[0012] The present invention relates to methods to create, select andassemble transcriptional or translational regulatory elements,including, for example, promoter, enhancer and IRES elements, andmethods to examine the ability of such regulatory elements to modulatetranscription or translation in eukaryotic cells. A method of theinvention can utilize, for example, an expression vector construct,which allows the insertion of relatively small nucleotide sequences(oligonucleotides) to be examined for regulatory activity, and for thesystematic testing and isolation of such a regulatory element.

[0013] A method of the invention provides an analytic tool and an engineof discovery for transcriptional and translational regulatory sequences,and can provide a basis for diagnostic applications. As such, thepresent invention also provides regulatory oligonucleotides that can beused in expression vectors for controlling gene expression in diagnosticand therapeutic applications, and provides vectors useful foridentifying such transcriptional and translational regulatory elements.

[0014] The present invention relates to a method of identifying anoligonucleotide having transcriptional or translational regulatoryactivity in a eukaryotic cell. Such a method can be performed, forexample, by integrating an oligonucleotide to be examined fortranscriptional or translational regulatory activity into a eukaryoticcell genome, wherein the oligonucleotide is operatively linked to anexpressible polynucleotide, and detecting a change in the level ofexpression of the expressible polynucleotide in the presence of theoligonucleotide as compared to the absence of the oligonucleotide. Theexpressible polynucleotide generally contains a cloning site such thatthe oligonucleotide can be operatively linked to the expressiblepolynucleotide by insertion into the cloning site, and also can containa transcription initiator sequence. The expressible polynucleotidegenerally is a reporter polypeptide, which can be a fluorescentpolypeptide, an antibiotic resistance polypeptide, a cell surfaceprotein marker, an enzyme, or a peptide tag.

[0015] In one embodiment, the invention provides a method to identify anoligonucleotide having transcriptional regulatory activity, for example,promoter activity, enhancer activity, or silencer activity. Theexpressible polynucleotide generally is operatively linked minimalpromoter, for example, a TATA box, a minimal enkephalin promoter, or aminimal SV40 early promoter. The expressible polypeptide can comprise amonocistronic reporter cassette, which encodes a single reporterpolypeptide, or can be a dicistronic reporter cassette, which includes,in operative linkage, a regulatory cassette comprising a minimalpromoter and a cloning site, a nucleotide sequence encoding a firstreporter polypeptide, a spacer sequence comprising an internal ribosomeentry site (IRES), and a nucleotide sequence encoding a second reporterpolypeptide, whereby an oligonucleotide to be examined fortranscriptional regulatory activity is operatively linked to thedicistronic reporter cassette by insertion into the cloning site. Theexpressible polynucleotide can be contained in a vector, which can be aplasmid based vector such as the vectors exemplified by SEQ ID NO: 2 andSEQ ID NO: 3, or can be contained in a retroviral vector such as thevectors exemplified by SEQ ID NO: 1 and SEQ ID NO: 9.

[0016] The oligonucleotide to be examined for transcriptional activitycan be a synthetic oligonucleotide, for example, a randomoligonucleotide sequence such an oligonucleotide in a library ofrandomized oligonucleotides, or a variegated oligonucleotide that isbased on, but different from a known oligonucleotide such as a knowntranscriptional regulatory element. The oligonucleotide to be examinedfor transcriptional activity also can be a portion of an oligonucleotidefragment of genomic DNA.

[0017] In another embodiment, the invention provides a method toidentify an oligonucleotide having translational regulatory activity,for example, a translational enhancer or inhibitor or an IRES element.In such a method, the expressible polynucleotide includes a promoter,which generally is a strong promoter such as an RSV promoter or CMVpromoter or the like. The expressible polynucleotide can include amonocistronic reporter cassette or dicistronic reporter cassette.Preferably, where the oligonucleotide is to be examined for IRESactivity, the expressible polynucleotide includes a dicistronic reportercassette, which contains, in operative linkage, a regulatory cassettecomprising a promoter, a nucleotide sequence encoding a first reporterpolypeptide, a spacer sequence comprising a cloning site, and anucleotide sequence encoding a second reporter polypeptide, whereby anoligonucleotide to be examined for IRES activity is operatively linkedto the nucleotide sequence encoding the second reporter polypeptide byinsertion into the cloning site. The expressible polynucleotide can becontained in a vector, for example, a retroviral vector such as thatexemplified by SEQ ID NO: 109.

[0018] The oligonucleotide to be examined for translational activity canbe a synthetic oligonucleotide, for example, a random oligonucleotidesequence such an oligonucleotide in a library of randomizedoligonucleotides, or a variegated oligonucleotide that is based on, butdifferent from a known oligonucleotide such as a known translationalregulatory element. The oligonucleotide to be examined for translationalactivity also can be a portion of a cDNA encoding a 5′ untranslatedregion of an mRNA, or can be an oligonucleotide fragment of genomic DNA.In addition, the oligonucleotide to be examined for translationalregulatory activity can be based on a sequence complementary to anoligonucleotide sequence of rRNA, preferably an un-base pairedoligonucleotide sequence of rRNA, including, for example, a variegatedpopulation of oligonucleotide sequences derived from an oligonucleotidesequence complementary to an un-base paired region of a rRNA.

[0019] In one embodiment, a method of the invention is performed suchthat the oligonucleotide to be examined for transcriptional ortranslational regulatory activity is operatively linked to theexpressible polynucleotide prior to integrating into the eukaryotic cellgenome. In another embodiment, the expressible polynucleotide is anendogenous polynucleotide in the eukaryotic cell genome, and theoligonucleotide to be examined for regulatory activity is introducedinto a cell containing the expressible polynucleotide and operativelylinked to the endogenous polynucleotide, for example, by homologousrecombination.

[0020] In yet another embodiment, the eukaryotic cell is a cell of atransgenic non-human eukaryote, wherein the cell contains a transgene.The transgene can be, for example, a recombinase recognition site thatis positioned with respect to an endogenous expressible polynucleotidesuch that an oligonucleotide inserted into the site is operativelylinked to the polynucleotide. The transgene also can be a heterologousexpressible polynucleotide, which is stably maintained in the eukaryoticcell genome, and can contain a cloning site for insertion of theoligonucleotide to be examined. In one embodiment, the oligonucleotideis an oligonucleotide to be examined for transcriptional regulatoryactivity, and the transgene is a dicistronic reporter cassettecomprising, in operative linkage, a regulatory cassette comprising aminimal promoter and a cloning site, a first reporter cassette, a spacersequence comprising an internal ribosome entry site (IRES), and a secondreporter cassette, whereby the oligonucleotide is operatively linked tothe dicistronic reporter cassette by insertion into the cloning site. Inanother embodiment, the oligonucleotide is an oligonucleotide to beexamined for translational regulatory activity, and the transgene is adicistronic reporter cassette comprising, in operative linkage, aregulatory cassette comprising a promoter, a first reporter cassette, aspacer sequence comprising a cloning site, and a second reportercassette, whereby the oligonucleotide is operatively linked to thesecond cistron by insertion into the cloning site.

[0021] A method of the invention also can be performed by cloning alibrary of oligonucleotides to be examined for transcriptional ortranslation regulatory activity into multiple copies of an expressionvector comprising an expressible polynucleotide, whereby theoligonucleotides are operatively linked to the expressiblepolynucleotide, thereby obtaining a library of vectors; contacting thelibrary of vectors with eukaryotic cells under conditions such that thevectors are introduced into the cell and integrate into a chromosome inthe cells; and detecting expression of an expressible polynucleotideoperatively linked to an oligonucleotide at a level other than a levelof expression of the expressible polynucleotide in the absence of theoligonucleotide. The eukaryotic cells can be any eukaryotic cells,including insect, yeast, amphibian, reptilian, avian or mammalian cells.Preferably, the cells are mammalian cells, including, for example,neuronal cells, fibroblasts, hepatic cells, bone marrow cells, bonemarrow derived cells, muscle cells and epithelial cells. The library ofoligonucleotides can be, for example, a library of randomizedoligonucleotides, a library of variegated oligonucleotides based on aselected oligonucleotide sequence, or a library of genomic DNAfragments.

[0022] In one embodiment, the oligonucleotide is an oligonucleotide tobe examined for transcriptional regulatory activity, and the expressiblepolynucleotide comprises, in operative linkage, a regulatory cassettecomprising a minimal promoter and a cloning site, and a reportercassette, whereby the oligonucleotide is operatively linked to theexpressible polynucleotide by insertion into the cloning site. Inanother embodiment, the oligonucleotide is an oligonucleotide to beexamined for transcriptional regulatory activity, and the expressiblepolynucleotide comprises a dicistronic reporter cassette comprising, inoperative linkage, a regulatory cassette comprising a minimal promoterand a cloning site, a nucleotide sequence encoding a first reporterpolypeptide, a spacer sequence comprising an internal ribosome entrysite (IRES), and a nucleotide sequence encoding a second reporterpolypeptide, whereby the oligonucleotide is operatively linked to thedicistronic reporter cassette by insertion into the cloning site. Theexpressible polynucleotide can be contained in a vector, for example, aplasmid vector as exemplified by SEQ ID NO: 2 and SEQ ID NO: 3 or aretroviral vector as exemplified by SEQ ID NO: 1 and SEQ ID NO: 9.

[0023] A method of identifying an oligonucleotide having transcriptionalregulatory activity can further include selecting a population of cellsexpressing the expressible polynucleotide operatively linked to anoligonucleotide at a level other than a level of expression of theexpressible polynucleotide in the absence of the oligonucleotide.Furthermore, the method can further include isolating the operativelylinked oligonucleotide. As such, the present invention provides anisolated synthetic transcriptional regulatory element obtained by thedisclosed method, and further provides a recombinant nucleic acidmolecule comprising a plurality of operatively linked isolatedtranscriptional regulatory elements, which can be the same or different.

[0024] In still another embodiment, the oligonucleotide is anoligonucleotide to be examined for translational regulatory activity,and the expressible polynucleotide is a dicistronic reporter cassettecomprising, in operative linkage, a regulatory cassette comprising apromoter, a nucleotide sequence encoding a first reporter polypeptide, aspacer sequence comprising a cloning site, and a nucleotide sequenceencoding a second reporter polypeptide, whereby the oligonucleotide isoperatively linked to the second cistron by insertion into the cloningsite. The expressible polynucleotide can be contained in a vector, forexample, a plasmid vector or a retroviral vector as exemplified by SEQID NO: 109. The method can include further selecting a population ofcells expressing the expressible polynucleotide operatively linked to anoligonucleotide at a level other than a level of expression of theexpressible polynucleotide in the absence of the oligonucleotide, andcan include a step of isolating the operatively linked oligonucleotide.As such, the invention provides an isolated synthetic translationalregulatory element, for example, an IRES element, which is obtainedusing the disclosed method, as well as a recombinant nucleic acidmolecule comprising a plurality of operatively linked isolatedtranslational regulatory elements, which can be the same or different.

[0025] The present invention also relates to an integrating expressionvector useful for identifying an oligonucleotide having transcriptionalor translational regulatory activity. An integrating expression vectorfor identifying a transcriptional regulatory element can contain, forexample, in operative linkage in a 5′ to 3′ orientation, a long terminalrepeat (LTR) containing a immediate early gene promoter, an R region, aU5 region, a truncated gag gene comprising sequences required forretrovirus packaging, a dicistronic reporter cassette including anucleotide sequence encoding a first reporter polypeptide, a spacersequence containing an IRES, a nucleotide sequence encoding a secondreporter polypeptide, and a regulatory cassette containing a cloningsite and a minimal promoter, and an LTR. The first and secondpolypeptides independently can be selected from a fluorescentpolypeptide such as green fluorescent protein, cyan fluorescent protein,red fluorescent protein, or an enhanced form thereof, an antibioticresistance polypeptide such as puromycin N-acetyltransferase, hygromycinB phosphotransferase, neomycin (aminoglycoside) phosphotransferase, andthe Sh ble gene product, a cell surface protein marker such as the cellsurface protein marker is neural cell adhesion molecule (N-CAM), anenzyme such as β-galactosidase, chloramphenicol acetyltransferase,luciferase, and alkaline phosphatase, or a peptide tag such as a c-mycpeptide, a polyhistidine, or the like. For example, the first reporterpolypeptide can be puromycin N-acetyltransferase and the second reporterpolypeptide can enhanced green fluorescent protein; or the firstreporter polypeptide can be puromycin N-acetyltransferase and the secondreporter polypeptide can be N-CAM.

[0026] The cloning site can be any sequence that facilitates insertionof an oligonucleotide in operative linkage to the expressiblepolynucleotide, for example, a restriction endonuclease recognition siteor a multiple cloning site containing a plurality of such sites, orrecombinase recognition site such as a lox sequence or an att sequence.The minimal promoter can be any minimal promoter, for example, a TATAbox, a minimal enkephalin promoter, or a minimal SV40 early promoter.Examples of integrating expression vectors of the invention are setforth as SEQ ID NO: 1 and SEQ ID NO: 9, and additional expressionvectors, which can integrate into a cell genome, are exemplified by SEQID NO: 2 and SEQ ID NO: 3.

[0027] An integrating expression vector for identifying anoligonucleotide having translational regulatory activity, particularlyIRES activity, can contain, for example, in operative linkage in a 5′ to3′ orientation, a long terminal repeat (LTR) containing a immediateearly gene promoter, an R region, a U5 region, a truncated gag genecomprising sequences required for retrovirus packaging, a dicistronicreporter cassette including a nucleotide sequence encoding a firstreporter polypeptide, a spacer sequence comprising a cloning site, anucleotide sequence encoding a second reporter polypeptide, and aregulatory cassette comprising a promoter, and an LTR. The first andsecond reporter polypeptide independently can be any reporterpolypeptide as disclosed herein or otherwise known in the art. Forexample, the first reporter polypeptide can be enhanced greenfluorescent protein and the second reporter polypeptide can enhancedcyan fluorescent protein. An example of an integrating expression vectoris provided by SEQ ID NO: 109.

[0028] A method of the invention provides a means to identify atranscriptional regulatory element. According to one embodiment,oligonucleotides in a library of synthetic DNA sequence elements arepositioned next to a minimal (core) promoter and screened for activityin mammalian cells using a high throughput selection strategy. Theselection process can identify a variety of individual transcriptionalregulatory oligonucleotide sequences that can enhance gene expressionfrom the minimal eukaryotic promoter. In another embodiment, a selectedtranscriptionally active element or an oligonucleotide to be examinedfor transcriptional regulatory activity and a known regulatory motif iscombined to produce promoter/enhancer element cassettes. By varying theorder, number and spacing of elements in these cassettes andsubsequently selecting for promoter activity, transcriptional regulatoryelements having desirable characteristics can be isolated and the rulesthat govern functional interactions between elements can be determined.

[0029] A method of the invention also provides a means to identify anoligonucleotide that confers a transcriptional regulatory function on anoperatively linked polynucleotide in a eukaryotic cell. The method canbe performed, for example, by operatively linking an oligonucleotide tobe examined for transcriptional regulatory activity to an expressiblepolynucleotide, the expression of which can be driven by a minimalpromoter, and detecting an increased or decreased level of transcriptionof the polynucleotide due to the presence of the oligonucleotide. Thetranscriptional activity due to the oligonucleotide can be examined invitro or in vivo in a cell in culture or in an organism. In oneembodiment, the transcriptional activity is examined in a cell in vivofollowing integration of the construct comprising the oligonucleotideand expressible polynucleotide into a chromosome in the cell. Such amethod provides a means to identify a regulatory element that can act byinducing a local change in the DNA or chromatin conformation, forexample, DNA bending, which can increase access of the transcriptionmachinery to the sequence to be transcribed. Such regulatory elementscannot be detected using methods that rely exclusively on identifying aprotein/DNA interaction as a means to identify a regulatory element.

[0030] A method of identifying an oligonucleotide that conferstranscriptional regulatory activity also can be performed by providingan expression vector, which contains a reporter cassette comprising anucleotide sequence encoding a reporter molecule, wherein the reportercassette is operatively linked to a regulatory cassette comprising aminimal promoter element; cloning a library of randomizedoligonucleotides into multiple copies of the expression vector, whereinan oligonucleotide of the library is operatively linked to a minimalpromoter element, and wherein the randomized oligonucleotide canpotentially function as a transcriptional regulatory sequence, to form alibrary of vectors that differ in the potential regulatory sequences;transfecting eukaryotic cells with the library of different vectors toform transfected eukaryotic host cells; culturing the transfectedeukaryotic cells under conditions suitable for integration of the vectorinto the host cell and expression of the reporter molecule; selecting apopulation of transfected eukaryotic cells that express the reportermolecule; and obtaining from the selected population of cells,transcriptional regulatory sequences, which can be a library oftranscriptional regulatory sequences.

[0031] Optionally, a reporter cassette useful for identifying atranscriptional regulatory element according to a method of theinvention is a dicistronic construct that includes the nucleotidesequence encoding the first reporter molecule, and also includes asecond nucleotide sequence encoding a second selectable marker, which isdifferent from the first reporter molecule. Preferably, the dicistronicconstruct includes an IRES element in the intercistronic sequence. Sucha construct facilitates the identification and isolation oftranscriptional regulatory oligonucleotides.

[0032] A method of the invention also provides a means to identify atranslational regulatory element, including a translational enhancer, anIRES element, and the like. According to one embodiment, a complexlibrary of synthetic DNA sequence elements is positioned in anintervening sequence between first and second nucleotide sequences thatencode first and second reporter molecules in a dicistronic reportercassette, and screened for translational regulatory activity in aeukaryotic cell, for example, a mammalian cell, optionally using a highthroughput selection strategy. Using such a method, a variety ofregulatory oligonucleotide sequences that initiate cap-independenttranslation of the second reporter molecule and, therefore, function asIRES sequences have been identified. In another embodiment, a selectedtranslational regulatory element is combined with a known regulatorymotif such that, by varying the order, number and spacing of elements ina reporter cassette and subsequently selecting for expression,translational regulatory elements having desirable characteristics canbe isolated and the rules that govern functional interactions betweenelements can be determined.

[0033] A method of the invention provides a means to identify anoligonucleotide that confers a translational regulatory function on anoperatively linked polynucleotide in a eukaryotic cell. Such a methodcan be performed, for example, by operatively linking an oligonucleotideto be examined for translational regulatory activity to an expressiblepolynucleotide, which includes or encodes the elements generallyrequired for translation such as start and stop codons (i.e., acistron), and detecting an increased or decreased level of translationof the polynucleotide due to the presence of the oligonucleotide. Thetranslational activity due to the oligonucleotide can be examined invitro or in vivo in a cell in culture or in an organism. In oneembodiment, the translational activity is examined in a cell in vivofollowing integration of the construct comprising the oligonucleotideand expressible polynucleotide into a chromosome in the cell.

[0034] A method of identifying an oligonucleotide having translationalregulatory activity also can be practiced by providing an expressionvector comprising a dicistronic reporter cassette, which includes afirst nucleotide sequence encoding a first reporter protein and a secondnucleotide sequence encoding a second reporter protein, which isdifferent from the first reporter protein, wherein the dicistronicreporter cassette is operatively linked to a regulatory cassettecomprising a promoter element, and wherein the reporter cassettecontains an intercistronic spacer nucleotide sequence between the firstand second encoding nucleotide sequences such that an oligonucleotide tobe examined for translational regulatory activity can be introduced intothe spacer sequence and is operatively linked to the second nucleotidesequence; cloning the oligonucleotides of a library of randomizedoligonucleotides into multiple copies of said expression vector, whereinan oligonucleotide is introduced into the spacer nucleotide sequence,and wherein the randomized oligonucleotide potentially functions as atranslational regulatory sequence, to form a library of vectorsdiffering in said potential regulatory sequences; transfectingeukaryotic cells with the library of different vectors to formtransfected eukaryotic host cells; culturing the transfected eukaryoticcells under conditions suitable for integration of the vector into thehost cell and expression of said first and second reporter proteins;selecting a population of transfected eukaryotic cells that express saidsecond reporter protein; and obtaining from the selected population ofcells oligonucleotides that function as translational regulatorysequences. A reporter protein (and encoding nucleotide sequence) usefulin a method or composition of the invention can be any reporter protein,as disclosed herein, including a fluorescent, luminescent orchemiluminescent protein, an enzyme, a receptor (or ligand), a proteincan confers resistance to an antibiotic or other toxic agent, and thelike. The reporter molecule can be selected, for example, based on itscost, convenience, availability or other such factor, and generallyprovides a means to identify and, if desired, isolate a cell expressingthe reporter molecule.

[0035] The present invention also provides isolated synthetictranscriptional or translational regulatory oligonucleotides, which canbe identified and isolated using a method as disclosed herein. Suchsynthetic regulatory oligonucleotides can be useful for regulating theexpression of an operatively linked polynucleotide, and can beparticularly useful for conferring tissue specific, developmental stagespecific, or the like expression of the polynucleotide, includingconstitutive or inducible expression. A synthetic regulatoryoligonucleotide of the invention also can be a component of anexpression vector or of a recombinant nucleic acid molecule comprisingthe regulatory oligonucleotide operatively linked to an expressiblepolynucleotide.

[0036] Accordingly, the present invention provides compositionscomprising an oligonucleotide of the invention. In one embodiment, thecomposition is a vector, which generally is an expression vector and canbe an integrating expression vector that, upon being introduced into acell, can integrate into the genome of the cell, particularly aeukaryotic cell. As such, the invention also provides a host cellcontaining a synthetic transcriptional or translational regulatoryoligonucleotide of the invention, which can be operatively linked to aheterologous polynucleotide. Also provided is a recombinant nucleic acidmolecule, which contains a transcriptional or translational regulatoryelement of the invention operatively linked to an expressiblepolynucleotide, which is heterologous to the regulatory element.

[0037] The present invention also provides systems, which can be in kitform and are useful for practicing aspects of the present invention. Thekit generally contains an oligonucleotide of the invention or contains areagent for identifying a transcriptional or translational regulatoryelement according to a method of the invention. In one embodiment, thekit contains a synthetic regulatory oligonucleotide, which can be anisolated form or can be a component of a vector or a recombinant nucleicacid molecule. The kit also can contain a plurality of synthetictranscriptional or translational regulatory oligonucleotides orcombinations thereof, which, optionally, contain additional sequencesthat facilitate linking the regulatory oligonucleotide to a secondnucleotide sequence, which can be a vector, for example. Such aplurality of synthetic regulatory elements in kit form provides aconvenient means to select a regulatory element having desiredcharacteristics, for example, tissue specific expression or a low levelof constitutive expression or other characteristic. In anotherembodiment, the kit contains a vector for identifying a transcriptionalor translational regulatory element, for example, an integratingexpression vector.

BRIEF DESCRIPTION OF THE DRAWINGS

[0038]FIG. 1 illustrates a portion of the MESVR/EGFP*/IRESpacPro(ori)vector (nucleotides 3592 to 3726 of SEQ ID NO: 1), including theupstream long terminal repeat (LTR) U3 region, which contains the RSVimmediate early gene promoter (R) to drive high levels of viral RNAgenome production and the U5 sequence. Δgag indicates region oftruncation of the group specific antigen gene; EGFP indicates enhancedgreen fluorescent protein; IRES indicates internal ribosome entry site;PAC indicates puromycin N-acetyltransferase coding sequence. Dottedlines indicate an expanded view of the synthetic promoter (Promoter)located in the downstream LTR U3 region. This promoter contains amultiple cloning site (Nsi I-Bgl II), TATA box and consensus initiator(Inr) sequences. The position at which the synthetic promoter fuses intothe downstream R region is indicated.

[0039]FIGS. 2A to 2C illustrate maps of various expression vectorsuseful for identifying an oligonucleotide regulatory element.

[0040]FIG. 2A illustrates the vector pnZ-MEK (SEQ ID NO: 2). Variousrestriction endonuclease recognition sites are indicated. MEK indicatesminimal enkephalin promoter; Zeocin®, NeoR and bla^(P) indicate codingsequences for polypeptides conferring Zeocin® (bleomycin), neomycin andkanamycin resistance, respectively. SV40 intron and SV40 polyA⁺ signalsequence are indicated. TK polyA⁺ indicates thymidine kinase polyA⁺signal sequence. ColE1 ori indicates E. coli origin of replication.

[0041]FIG. 2B illustrates the vector pnL-MEK. Various sites andsequences are as in FIG. 2A. Luciferase indicates luciferase codingsequence.

[0042]FIG. 2C illustrates the vector pnH-MEK (SEQ ID NO: 3). Varioussites and sequences are as in FIG. 2A. Hygromycin^(R) indicates codingsequence for polypeptide conferring hygromycin B resistance.

[0043]FIG. 3 illustrates the retroviral vectorMESVR/EGFP/ECFP/RSVPro(ori-) (SEQ ID NO: 109). Various restrictionendonuclease recognition sites are indicated.

[0044]FIG. 4 shows the region of complementarity of the ICS1-23 sequence(SEQ ID NO: 105) and 18S rRNA (SEQ ID NO: 107). “a” and “b” indicateportions of the ICS1-23 sequence (SEQ ID NO: 105).

[0045]FIG. 5 shows the complementary sequence matches between YAP1 orp150 leader sequences and 18S rRNA. SEQ ID NOS: are indicated. Verticallines indicate base pairing and open circles represent GU base pairing.The longest uninterrupted stretches of complementarity for each matchare indicated by the shaded nucleotides.

[0046]FIGS. 6A and 6B illustrate sites in which IRES modules of theinvention share complementarity to mouse 18S ribosomal RNA (rRNA; SEQ IDNO: 196).

[0047]FIG. 6A provides a linear representation of the 18S rRNA, thevertical lines below the linear representation are sites at whichselected IRES modules share 8 or 9 nucleotides of complementarity withthe to 18S rRNA sequence.

[0048]FIG. 6B shows a secondary structure of the 18S rRNA, and the darkbars indicate the positions of the complementary sequence matches toselected IRES modules of the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0049] The present invention provides methods for identifying synthetictranscriptional and translational regulatory elements, vectors usefulfor identifying such regulatory elements, and isolated regulatoryelements, which comprise oligonucleotide sequences that, when present ina gene expression context in a eukaryotic cell, can confer a regulatoryfunction onto the gene or a polynucleotide encoded by the gene. The genesegment or other expressible polynucleotide can be in any expressionconstruct engineered for expression in a eukaryotic cell, particularlyin the form of a chromosome-associated polynucleotide, which is subjectto the nuances of complexity associated with gene expression in achromosome as compared, for example, an episomal (extra-chromosomal)element. A chromosomal context often is a consequence of a gene therapyprocedure, wherein the transgene integrates into the chromosome.

[0050] A method of identifying a transcriptional regulatory element canbe performed in various ways, as disclosed herein (see, also, Edelman etal., Proc. Natl. Acad. Sci. USA, 97:3038-3043, 2000, which isincorporated herein by reference). In one embodiment, an oligonucleotideto be examined for transcriptional regulatory activity is operativelylinked to an expressible polynucleotide, which is or can be operativelylinked to a minimal promoter, and a change in the level of expression ofthe polynucleotide identifies the oligonucleotide as a transcriptionalregulatory oligonucleotide. As used herein, the term “transcriptionalregulatory oligonucleotide” or “transcriptional regulatory element” orthe like refers to a nucleotide sequence that can effect the level oftranscription of an operatively linked polynucleotide. Thus, the termencompasses oligonucleotide sequences that increase the level oftranscription of a polynucleotide, for example, a promoter element or anenhancer element, or that decrease the level of transcription of apolynucleotide, for example, a silencer element. As disclosed herein, atranscriptional regulatory element can be constitutively active orinducible, which can be inducible from an inactive state or from a basalstate, and can be tissue specific or developmental stage specific, orthe like.

[0051] As disclosed herein, the present methods provide a means foridentifying and isolating a translational regulatory element thatconfers tissue specific or inducible translation on an operativelylinked expressible polynucleotide. As used herein, the term “tissuespecific,” when used in reference to a translational regulatory element,means a nucleotide sequence that effects translation of an operativelylinked expressible polynucleotide in only one or a few cell types. Asused herein, the term “inducible,” when used in response to atranslational regulatory element, means a nucleotide sequence that, whenpresent in a cell exposed to an inducing agent, effects an increasedlevel of translation of an operatively linked expressible polynucleotideas compared to the level of translation, if any, in the absence of aninducing agent.

[0052] The term “inducing agent” is used to refer to a chemical,biological or physical agent that effects translation from an inducibletranslational regulatory element. In response to exposure to an inducingagent, translation from the element generally is initiated de novo or isincreased above a basal or constitutive level of expression. Suchinduction can be identified using the methods disclosed herein,including detecting an increased level of a reporter polypeptide encodedby the expressible polynucleotide that is operatively linked to thetranslational regulatory element. An inducing agent can be, for example,a stress condition to which a cell is exposed, for example, a heat orcold shock, a toxic agent such as a heavy metal ion, or a lack of anutrient, hormone, growth factor, or the like; or can be exposure to amolecule that affects the growth or differentiation state of a cell suchas a hormone or a growth factor. As disclosed herein, the translationalregulatory activity of an oligonucleotide can be examined in cells thatare exposed to particular conditions or agents, or in cells of aparticular cell type, and oligonucleotide that have translationalregulatory activity in response to and only under the specifiedconditions or in a specific cell type can be identified.

[0053] As used herein, the term “expressible polynucleotide” is usedbroadly herein to refer to a nucleotide sequence that can be transcribedor translated. Generally, an expressible polynucleotide is apolydeoxyribonucleotide, which can be transcribed in whole or in partinto a polyribonucleotide, or is a polyribonucleotide that can betranslated in whole or in part into a polypeptide. The expressiblepolynucleotide can include, in addition to a transcribed or translatedsequence, additional sequences required for transcription such as apromoter element, a transcription start site, a polyadenylation signal,and the like; or for translation such as a start codon, a stop codon andthe like; or can be operatively linked to such sequences, which can becontained, for example, in a vector into which the polynucleotide isinserted. As such, the term “cistron” also is used herein to refer to anexpressible polynucleotide that includes all or substantially all of theelements required for expression of an encoded polypeptide. Examples ofexpressible polynucleotides include nucleotide sequences encoding areporter polypeptide or other selectable marker, or a nucleotidesequence encoding a polypeptide of interest, for example, a polypeptidethat is to be expressed in a cell as a means to produce the polypeptidein a convenient and commercially useful manner, or as part of a genetherapy treatment.

[0054] An oligonucleotide to be examined for transcriptional (ortranslational) activity can be operatively linked to an expressiblepolynucleotide, which, for example, can encode a reporter molecule. Asused herein, the term “operatively linked” or “functionally adjacent”means that a regulatory element, which can be a synthetic regulatoryoligonucleotide of the invention or an oligonucleotide to be examinedfor such activity, is positioned with respect to a transcribable ortranslatable nucleotide sequence such that the regulatory element caneffect its regulatory activity. An oligonucleotide havingtranscriptional enhancer activity, for example, can be located at anydistance, including adjacent to or up to thousands of nucleotides awayfrom, and upstream or downstream from the promoter, which can be aminimal promoter element, and nucleotide sequence to be transcribed, andstill exert a detectable effect on the level of expression of an encodedreporter molecule. In comparison, a translational regulatory elementgenerally is positioned within about 1 to 500 nucleotides, particularlywithin about 1 to 100 nucleotides of a translation start site. For avariety of considerations such as convenience of manipulations, andsubsequent use of discrete promoter/enhancer constructs identified bythe present invention, an oligonucleotide to be examined fortranscriptional enhancer activity generally is positioned relativelyclose to the minimal promoter element, for example, within about 1 to100 nucleotides, preferably within about 3 to 50 nucleotides of thepromoter.

[0055] The term “operatively linked” also is used herein with respect toa first and second polypeptide (or peptide) to refer to encodingsequences that are linked in frame such that a fusion polypeptide can beproduced. Similarly, the tern is used to refer to two or more cistronsof an expressible polynucleotide that are transcribed as a single RNAmolecule, which can contain, for example, an IRES element of theinvention in an intercistronic position.

[0056] A method of identifying a transcriptional regulatory element canbe performed using an expression vector, which contains a reportercassette comprising a nucleotide sequence encoding at least a firstreporter molecule, wherein the reporter cassette is operatively linkedto a regulatory cassette comprising a minimal promoter element, is used.The reporter cassette functions to indicate (report) that the reportermolecule has been expressed by means of expression of the detectablereporter molecule. The reporter cassette is expressed under the controlof (operatively linked to) the regulatory cassette, which also containscloning sites for the introduction of an oligonucleotide to be examinedfor transcriptional regulatory activity, and further contains a minimalpromoter element such that, upon introduction of a regulatoryoligonucleotide, expression of the reporter cassette is altered.

[0057] A library of randomized oligonucleotides to be examined fortranscriptional regulatory activity can be provided, and one or moreindividual members of the library can be cloned into multiple copies ofthe regulatory cassette of the expression vector. The oligonucleotide tobe examined for transcriptional regulatory activity is introduced suchthat it is operatively linked to the minimal promoter element in theregulatory cassette and, therefore, has the potential to function as atranscriptional regulatory element. In this way, a library of differentconstructs, which can be contained in a vector, is formed, eachconstruct differing in the introduced potential regulatoryoligonucleotide sequence.

[0058] The oligonucleotide sequences to be examined for transcriptional(or translational) regulatory activity also can be sequences isolatedfrom genomic DNA (or mRNA) of a cell. For example, oligonucleotides tobe examined for transcriptional regulatory activity can be obtainedusing an antibody that is specific for a particular transcription factorsuch as an anti-TATA box binding protein antibody such that nucleotidesequences bound to the TATA box binding protein are isolated. Theisolated sequences then can be amplified and examined fortranscriptional regulatory activity using a method as disclosed herein.Similarly, transcriptionally active regions of genomic DNA can beobtained using an antibody that specifically binds acetylated histoneH4, which is associated with unwound regions of chromosomal DNA. Sincesuch chromosomal regions are associated with transcriptional activity,this method provides a means to enrich for oligonucleotide sequencesthat are involved in transcriptional regulation. Methods and reagentsfor isolating transcriptionally active regions of chromosomal DNA arewell known (see, for example, Orlando and Paro, Cell 75:1187-1198, 1993;and Holmes and Tjian, Science, 288:867-870, 2000, each of which isincorporated herein by reference) and commercially available (forexample, anti-acetyl histone H4 antibody, Upstate Biotechnology;anti-TFIID (TATA binding protein) antibody, Santa Cruz Biotechnology).

[0059] Oligonucleotide to be examined for translational regulatoryactivity also can be, for example, cDNA sequences encoding 5′ UTRs ofcellular mRNAs, including a library of such cDNA molecules. Furthermore,as disclosed herein, translational regulatory elements identifiedaccording to a method of the invention, including synthetic IRESelements, have been found to be complementary to oligonucleotidesequences of ribosomal RNA (rRNA; see FIG. 6), particularly to un-basepaired oligonucleotide sequences of rRNA, which are interspersed amongdouble stranded regions that form due to hybridization ofself-complementary sequences within rRNA (see FIG. 7B). Accordingly,oligonucleotides to be examined for translational regulatory activity,including IRES activity, can be designed based on their beingcomplementary to an oligonucleotide sequence of rRNA, particularly to anun-base paired oligonucleotide sequence of rRNA such as a yeast, mouseor human rRNA (SEQ ID NOS: 110, 111 or 112, respectively; see, also,GenBank Accession Nos. V01335, X00686, X03205, respectively, each ofwhich is incorporated herein by reference). In addition,oligonucleotides to be examined for translational regulatory activitycan be a library of variegated oligonucleotide sequences (see, forexample, U.S. Pat. No. 5,837,500), which can be based, for example, on atranslational regulatory element as disclosed herein or identified usinga method of the invention, or on an oligonucleotide sequencecomplementary to an un-base paired region of a rRNA.

[0060] The effect of an introduced oligonucleotide on transcription ofthe reporter molecule can be examined in vitro or in vivo, including ina cell in culture or in a cell in an organism. Generally, the expressionof the reporter molecule from the minimal promoter is determined, thenthe effect of an introduced oligonucleotide on the level of expressionis determined. Expression from the minimal promoter can be determinedprior to introducing the element or can be determined in a parallelstudy. For example, an in vitro transcription reaction can be used todetermine the level of expression of the reporter in the presence orabsence of the oligonucleotide, wherein a difference in the levels ofexpression indicates that the oligonucleotide has transcriptionalregulatory activity. In one embodiment, the in vitro transcriptionreactions are performed in a high throughput format, for example, in thewells of a plate or in discrete identifiable positions in a microarray,for example, on a silicon wafer or glass slide or the like.

[0061] In another embodiment, the oligonucleotide is examined in a cell,particularly a eukaryotic cell, which can be a cell in culture or a cellin an organism, for example, a transgenic non-human eukaryotic organism.The construct comprising the oligonucleotide to be examined operativelylinked to the reporter cassette and regulatory cassette is introducedinto the cell by any of various transfection methods. Preferably, theconstruct is contained in a vector, which generally is an expressionvector, although the elements required for expression also can be partof the construct. Eukaryotic cells are transfected with a library ofdifferent vectors to form transfected eukaryotic host cells.Transfection can be performed using methods as disclosed herein orotherwise known in the art. In a particular embodiment, the constructcomprising the reporter and regulatory cassettes is contained in a viralvector such as a retroviral vector, which is introduced into a targetcell by viral infection. The transfected cells then can be culturedunder conditions suitable for the vector to integrate into the hostcell, and for the reporter molecule to be expressed if theoligonucleotide has transcriptional regulatory activity. A selectionstep then can be performed such that cells expressing the reportermolecule are identifiable, and the regulatory sequence in the selectedcells can be isolated.

[0062] A method of identifying a translational regulatory element,including a synthetic translational enhancer or a synthetic IRESsequence, can be performed similarly. As disclosed herein, a method ofthe invention provides a means to identify a translational regulatoryelement that can enhance the level of translation or can reduce orinhibit the level of translation of an operatively linked expressiblepolynucleotide. A translational enhancer or inhibitor can be identified,for example, by operatively linking the oligonucleotide to be examinedfor translational regulatory activity to an expressible polynucleotide,which can, in turn, be operatively linked to a strong promoter, whereinan increase or decrease in the level of translation in the presence ofthe oligonucleotide as compared to its absence identifies theoligonucleotide as a translational regulatory element. The constructcomprising the oligonucleotide to be examined and the regulatory andreporter cassettes, which can be in a vector such as an expressionvector, can include a dicistronic reporter cassette, which isoperatively linked to a regulatory cassette comprising a strong promoterelement. The dicistronic reporter cassette contains a first nucleotidesequence encoding a first reporter molecule and a second nucleotidesequence, which is operatively linked to the first nucleotide sequenceand encodes a second reporter protein, which is different from the firstreporter protein. The reporter cassette functions to indicate (report)that the first or second reporter protein or both have been expressed,by means of transcription and translation of the nucleotide sequencesencoding the first and second reporter proteins.

[0063] The first and second nucleotide sequence in the dicistronicreporter cassette are separated by an intercistronic sequence, whichfacilitates the introduction and operative linkage of an oligonucleotidesequence to be examined for IRES or other translational regulatoryactivity. The intercistronic spacer nucleotide sequence generallycontains a site for cloning the oligonucleotide sequence to be examinedfor translational regulatory activity, particularly IRES activity, in aposition to effect translation of the second cistron. Upon introductionof a nucleotide sequence that functions as an IRES, the secondnucleotide sequence (cistron) is translated to produce an expressedsecond reporter protein.

[0064] Following the rules for transcription of mRNA and translation ofprotein, the second nucleotide sequence of the dicistronic reportercassette is located 3′ (downstream) from the termination codon for thefirst encoded protein, and 5′ (upstream) from the transcriptiontermination and polyadenylation signals of the mRNA transcript. Theresult is a dicistronic construct which, upon transcription, forms anmRNA transcript that encodes two polypeptides, the first and secondreporter molecules.

[0065] Currently, no general methodology exists for synthesizing,selecting, and varying the content of transcriptional or translationalregulatory elements in the context of a eukaryotic chromosome. Moreover,there is relatively little information as to whether either natural orsynthetic promoters, when coupled to a fluorescent marker, can be usedto sort cells that may be characteristic of a particular phenotype.However, methods have been reported that are either related to thedisclosed regulatory element selection technique or represent attemptsat making synthetic promoters. For example, Li et al. (NatureBiotechnol. 17:241-245, 1999) describe building synthetic promoters thatfunction in muscle cells. These myogenic promoters were made one at atime by multimerizing known elements such as the E-box, the serumresponse element (SRE), and the binding site for MEF-1 (amuscle-specific transcription factor) into arrays. Various combinationsof these sites were then cloned upstream of a minimal promoter andluciferase gene cassette, and transfected individually into cell linesderived from muscle in order to score their relative promoter activity.Eventually, after screening several of these luciferase constructs, apanel of “super-promoters”, which work better than the promoters fromknown muscle-specific genes, was assembled. However, Li et al do notdescribe an EGFP/FACS sorting technique. As such, an advantage of thepresent invention is that one can screen over a million candidates priorto confirming their activity in a luciferase system, whereas thepromoter technique described by Li et al. merely makes and analyzespromoter activity one at a time.

[0066] Asoh et al. (Proc. Natl. Acad. Sci., USA 91:6982-6986, 1994)described a technique for cloning random fragments of genomic DNA in apolyoma virus in order to up-regulate the expression of the large Tantigen. This assay for enhancer activity was based on the ability ofthe virus to replicate more efficiently, and the activity of putativeenhancer elements was scored by increased neomycin resistance. Therationale of this method is that an active enhancer sequence wouldincrease the ability of an enhancerless polyoma virus to replicate, andthis would be scored as a neomycin resistant cell. However, theselection system of Asoh et al. differs from the present invention inthat increased viral replication is selected for rather than enhancedtranscription. Furthermore, there is no testing of these sequences forpromoter activity in an independent system.

[0067] Others have described using the DNA binding properties ofpromoter elements to develop techniques that isolate elements usingnuclear extracts from cells. Such techniques select motifs based ontheir ability to bind proteins. These techniques allow for pre-selectingsequences that have binding activity as a basis for further testing ofsuch selected sequences for promoter activity. Previous work describessuch an enrichment of DNA binding elements, including the CAST method(Funk et al., Proc. Acad. Natl. Sci. USA 89:9484-9488, 1992) (Gruffat etal., Nucl. Acids Res. 22:1172-1178, 1994), the MuST method (Nallur etal., Proc. Acad. Natl. Sci., USA 93:1184-1189, 1996) and the FROGSmethod (Mead et al., Proc. Acad. Natl. Sci., USA 95:11251-11256, 1998).The CAST technique was one of the first methods used to isolate DNAbinding sites from a pool of random DNA sequences using the gel mobilityshift assay. The MuST technique is a multiplex selection approach, inwhich a library of potential DNA binding elements that may function ingene transcription, is subjected to one or more rounds of proteinbinding using nuclear extracts from different mammalian cell types. Thisassay gives a profile of all the elements that are capable of bindingnuclear factors and represents an extremely useful “up-front” procedurethat would complement our selection approach.

[0068] The CAST and MuST techniques, however, fall short of thepresently disclosed methods in that CAST and MuST do not provide anactivity assay to demonstrate whether the elements that are selected insuch DNA binding procedures function to regulate transcription in thecells from which the nuclear extracts are prepared. The FROGS techniqueis similar to CAST and MuST, exploiting the advantage of selecting onlythose elements that bind to proteins. As such, these methods do not testthe selected elements for regulatory activity, and bias against findingelements that can function as regulatory elements, but do not actuallybind to proteins.

[0069] Another method, NOMAD, (Rebatchouk et al., Proc. Acad. Natl. Sci.USA 93:10891-10896, 1996), involves the design of a modular reportervector system that is applied to the enterprise of shuffling promoterelements in order to determine the effects of ordering, spacing, andinversions of such elements on promoter activity. The goal of the NOMADprocedure is to provide extreme flexibility in the ability to clone DNAin a directional fashion and also to easily modify and rearrange thesesequences. Thus, the NOMAD vector system provides an alternative to thedisclosed successive element ligation procedure used to ligate promoterelements in a defined order and polarity.

[0070] Dirks et al., U.S. Pat. No. 6,060,273, describe methods andcompositions for identifying IRES elements. Although Dirks et al.,describe IRES nucleotide sequences of viral, cellular or syntheticorigin, they appear to refer only to synthesized nucleotide sequences ascompared to those isolated from a biological source, but do not disclosescreening synthetic oligonucleotides such as a library of randomoligonucleotides as disclosed herein. Singer et al. (Genes Devel.4:636-645, 1990) describe a method for selecting a basal promoter inyeast, but do not describe identifying cis enhancer elements or the useof the use of a method such as FACS sorting. Bell et al. (Yeast15:1747-1759, 1999) describe selection for yeast promoter using EGFP andFACS sorting, but do not describe screening random sequences forpromoter activity.

[0071] A method of the invention can be useful for quickly andconveniently screening a large number of oligonucleotides to identifythose having transcriptional or translational regulatory activity. Forexample, a library of randomized oligonucleotides can be cloned intomultiple vectors comprising the dicistronic reporter cassette such thatthe oligonucleotides are operatively linked by insertion into the spacersequence in a position to function as an IRES and initiate translationof the second reporter protein. Eukaryotic cells can be transfected withthe library of different vectors to form transfected eukaryotic hostcells, in which the vector can integrate into the host cell genome andin which an oligonucleotide having IRES activity, for example, caneffect the level of expression of the second reporter molecule.Transfected cells expressing the reporter molecules then can be selectedbased on expression of the reporter molecule and the identified IRESoligonucleotide sequence can be isolated.

[0072] The oligonucleotides identified herein as having transcriptionalor translational regulatory activity provide modules that can be usedalone or combined with each other to produce desired activities. Forexample, concatemers of the identified IRES elements can vastly increasepolypeptide expression from an associated cistron, including concatemersof 2, 5, 10, 20, 35, 50 or 75 copies of an IRES element, whichindependently can be multiple copies of the same or different IRESelements, and which can be operatively linked adjacent to each other orseparated by spacer nucleotide sequences that can vary from 1 to about100 nucleotides in length. The capacity to drive high levels of proteinexpression has many applications for large scale protein production as,for example, in bulk manufacturing of drugs such as those produced inthe biotechnology industry, nutritional proteins, industrial enzymes,and the like. Furthermore, when present in polycistronic constructs,IRES elements can be used to co-express proteins in a cell. For example,a dicistronic construct can contain a first cistron that encodes apolypeptide of interest such as a polypeptide drug or the like and asecond cistron encoding a reporter polypeptide, which is expressed froman IRES element. Such a construct provides a means to select cells thatcontain the first cistron, which encodes the polypeptide of interest,thus minimizing the presence of contaminating cells that do not expressthe polypeptide and facilitating isolation of the polypeptide.

[0073] The disclosed elements also can bind to cellular factors, forexample, an IRES element can bind ribosomes in a cell, thus modifying orinhibiting its translational activity. As such, the elements can be usedto modulate (or inhibit) transcription or translation of a gene product,for example, during an industrial process or as part of a therapeuticprocedure. In particular, the elements can be used as a genetic “toxin”to inhibit specific transcription or translation in a target cell. Asdisclosed herein, introduction of a translational regulatory elementidentified according to a method of the invention as havingtranslational enhancing activity can reduce the level of translationwhen introduced into a cell. While no mechanism for this action isproposed herein or, in fact, relevant to using such an element to effecttranslational activity in a cell, one possibility is that the elementcan bind to and sequester trans-acting translational regulatory factorssuch as eukaryotic initiation factors or the like, similar to effectsseen with transcriptional regulatory elements when introduced intocells, or can bind to rRNA such that the rRNA is unavailable to effecttranslation. Thus, by introducing a translational regulatory elementhaving translational enhancing activity or IRES activity into aeukaryotic cell, the translational activity in the eukaryotic cell canbe reduced or inhibited. Conversely, by introducing a translationalregulatory element having translational inhibitory activity into aeukaryotic cell, translational activity in the cell is increased due,for example, to the sequestering of a trans-acting factor that otherwisebinds to an endogenous translational inhibitory sequence in the cell toinhibit translation.

[0074] A dicistronic reporter cassette can be used for identifying atranscriptional or translational regulatory element, depending on theparticular configuration as disclosed herein. For example, foridentifying a transcriptional regulatory element according to a methodof the invention, the dicistronic reporter cassette can contain adefined IRES element in the intercistronic spacer sequence, and thedicistronic reporter cassette is operatively linked, generally, to aminimal promoter element such that, upon introduction of a nucleotidesequence having transcriptional regulatory activity, transcription ofthe dicistronic cassette occurs. As compared to the level oftranscription of the dicistronic reporter cassette in the absence of anoligonucleotide to be examined for transcriptional regulatory activity,the level of transcription can increase due to the oligonucleotide orcan decrease due to the oligonucleotide. Since the promoter for thedicistronic reporter cassette is a minimal promoter, it can be difficultto identify a decrease in transcriptional activity due to theoligonucleotide. However, the ability of the oligonucleotide to decreasetranscriptional activity, for example, to act as a silencer, can beconfirmed by examining the effect of the oligonucleotide on acorresponding construct having a strong promoter, for example, an RSVpromoter, in place of the minimal promoter.

[0075] In comparison, for identifying an IRES element according to amethod of the invention, the dicistronic reporter cassette isoperatively linked, generally, to a strong promoter, and theoligonucleotide sequence to be examined for IRES activity is introducedinto the spacer sequence between the first and second cistron. The useof a dicistronic reporter cassette allows for the sequential selectionof cells expressing the first reporter molecule, followed by selectionof cells expressing the second reporter molecule provides an additionallevel of confirmation that regulation of expression arises due to thecontribution of the regulatory oligonucleotide and not, for example, dueto an artifact, such as rearrangement of the vector sequences duringtransfection to produce a functional promoter or functional IRES, orother event that can lead to expression of the reporter molecule outsidethe control of the introduced regulatory oligonucleotide and thepromoter element of the vector.

[0076] A dicistronic reporter cassette for identifying a transcriptionalregulatory element, for example, can allow for antibiotic selection(puromycin) as a first (or second) reporter selection, followed (orpreceded) by fluorescence-activated cell sorting (FACS) selection usinga fluorescent reporter such as enhanced green fluorescent protein(EGFP). A dicistronic reporter cassette for identifying an IRES element,for example, can allow for FACS with EGFP as a first reporter selection,followed by a second FACS selection using enhanced cyan fluorescentprotein (ECFP) as the second reporter selection. Other combinations ofreporter molecules are disclosed herein or can otherwise be selected bythe skilled artisan depending, for example, on cost, convenience oravailability of the reporter molecule or the means for identifying(detecting) its expression.

[0077] A synthetic transcriptional or translational regulatory elementcan be identified by screening, for example, a library ofoligonucleotides containing a large number of different nucleotidesequences. The oligonucleotides can be variegated oligonucleotidesequences, which are based on but different from a known transcriptionalor translational regulatory element, for example, an oligonucleotidecomplementary to an un-base paired sequence of a rRNA, or can be arandom oligonucleotide library. The use of randomized oligonucleotidesprovides the advantage that no prior knowledge is required of thenucleotide sequence, and provides the additional advantage thatcompletely new regulatory elements can be identified. Methods for makinga combinatorial library of nucleotide sequences or a variegatedpopulation of nucleotide sequences or the like are well known in the art(see, for example, U.S. Pat. No. 5,837,500; U.S. Pat. No. 5,622,699;U.S. Pat. No. 5,206,347; Scott and Smith, Science 249:386-390, 1992;Markland et al., Gene 109:13-19, 1991; O'Connell et al., Proc. Natl.Acad. Sci., USA 93:5883-5887, 1996; Tuerk and Gold, Science 249:505-510,1990; Gold et al., Ann. Rev. Biochem. 64:763-797, 1995; each of which isincorporated herein by reference).

[0078] A regulatory element can be of various lengths from a fewnucleotides to several hundred nucleotides. Thus, the length of anoligonucleotide in a library of oligonucleotides to be screened can beany length, including oligonucleotides as short as about 6 nucleotidesor as long as about 100 nucleotides or more. Generally, theoligonucleotides to be examined are about 6, 12, 18, 30 nucleotides orthe like in length. The complexity of the library, i.e., the number ofunique members, also can vary, although preferably the library has ahigh complexity so as to increase the likelihood that regulatorysequences are present. Libraries can be made using any method known inthe art, including, for example, using a oligonucleotide synthesizer andstandard oligonucleotide synthetic chemistry. Where the oligonucleotidesare to be incorporated into a vector, the library complexity depends inpart on the size of the expression vector population being used to clonethe random library and transfect cells. Thus, a theoretical limitationfor the complexity of the library also relates to utilization of thelibrary content by the recipient expression vector and by thetransfected cells, as well as by the complexity that can be obtainedusing a particular method of oligonucleotide synthesis.

[0079] A reporter cassette useful for identifying a transcriptional ortranslational regulatory element is a module that includes one or morenucleotide sequences encoding one or more reporter molecules,respectively. The reporter cassette is operatively linked to an adjacentregulatory cassette such that expression of the reporter cassette isunder the control of the regulatory cassette. The term “cassette” isused herein to refer to a nucleotide sequence that can be easily andconveniently manipulated by recombinant DNA methods such that it can belinked, including operatively linked, to one or more other nucleotidesequences or can be inserted into or removed from a vector. For example,a cassette can include restriction endonuclease recognition and cleavagesites or recombinase recognition and cleavage sites, which provide ameans for conveniently manipulating the cassette, for example, byinsertion into a vector.

[0080] As used herein, the term “reporter cassette” refers to anucleotide sequence that includes the signals for encoding a completereporter gene product, including the signals for initiation oftranslation, nucleotides encoding the structural protein, translationtermination codons, and 3′ sequence information to ensure a functionalmRNA transcript can be produced following activation of transcription ofa mRNA. As disclosed herein, a reporter cassette can be monocistronic,wherein it encodes a single reporter molecule, can be dicistronic,wherein it encodes two reporter molecules, or polycistronic, wherein itcontains more than two cistrons.

[0081] For the isolation of synthetic transcriptional regulatoryelements, the reporter cassette generally is monocistronic ordicistronic and, when dicistronic, contains an IRES element in theintercistronic spacer sequence between the cistrons encoding thereporter molecules. For the isolation of synthetic IRES sequences, thereporter cassette generally is a dicistronic reporter cassette, whereinthe oligonucleotide to be examined for IRES activity is introduced intothe intercistronic spacer sequence, which otherwise lacks an IRESelement. In a dicistronic reporter cassette, the second nucleotidesequence encoding a second reporter protein is operatively linked to thefirst nucleotide sequence encoding the first reporter protein. The firstand second coding sequences are separated by an intercistronic spacernucleotide sequence, into which an oligonucleotide sequence to beexamined for IRES activity can be introduced in operative linkage to thesecond coding sequence.

[0082] An oligonucleotide to be examined for transcriptional ortranslational regulatory element can be operatively linked, asappropriate, using any recombinant DNA methodology for combiningnucleotide sequences. The method can vary depending upon the particularnucleotide sequences, including whether the cassettes are containedwithin a vector. Particularly useful methods for inserting anoligonucleotide in operative linkage include the use of restrictionendonucleases, for example, by including a restriction endonucleaserecognition site or multiple cloning site in appropriate proximity tothe regulatory or reporter cassette of interest and flanking theoligonucleotide to be introduced therein, or by including a sitespecific recombinase recognition site such as a topoisomeraserecognition site, a lox site, or an att site at the appropriatelocation. By contacting the nucleotide sequences in the presence of theappropriate enzyme, i.e. a restriction endonuclease, topoisomerase, Crerecombinase, Int recombinase, or the like, the oligonucleotide can beoperatively linked with respect to the regulatory and reportercassettes.

[0083] The reporter molecules generally are polypeptides that can beexpressed under the conditions of the assay being utilized and theexpression of which is detectable. Where a method of the invention isperformed in a cell, for example, the reporter molecule can confer adetectable or selectable phenotype on cells expressing the molecule. Ina method utilizing a dicistronic reporter cassette, the encoded firstand second reporter proteins generally are different from each other,thus providing independent selection criteria. Reporter molecules, alsoreferred to as selectable markers, are well known in the art andinclude, a fluorescent protein such as green fluorescent protein (GFP)and enhanced and modified forms of GFP; an enzyme such β-galactosidase,chloramphenicol acetyltransferase, luciferase, or alkaline phosphatase;an antibiotic resistance protein such as puromycin N-acetyltransferase,hygromycin B phosphotransferase, neomycin (aminoglycoside)phosphotransferase, or the Zeocin^(R) gene product (Stratagene); a cellsurface protein marker such as N-CAM or a polypeptide that is expressedon a cell surface and has been modified to contain a tag peptide such asa polyhistidine sequence (e.g., hexahistidine), a V5 epitope, a c-mycepitope; a hemagglutinin A epitope, a FLAG epitope, or the like.

[0084] Expression of the reporter molecule can be detected using theappropriate reagent, for example, by detecting light emission uponaddition of luciferin to a luciferase reporter molecule, or by detectingbinding of nickel ion to a polypeptide containing a polyhistidine tag.Furthermore, the reporter molecule can provide a means of isolating theexpressed reporter molecule or a cell expressing the reporter molecule.For example, where the reporter molecule is a polypeptide that isexpressed on a cell surface and that contains a c-myc epitope, ananti-c-myc epitope antibody can be immobilized on a solid matrix andcells, some of which express the tagged polypeptide, can be contactedwith the matrix under conditions that allow selective binding of theantibody to the epitope. Unbound cells can be removed by washing thematrix, and bound cells, which express the reporter molecule, can beeluted and collected. Methods for detecting such reporter molecules andfor isolating the molecules, or cells expressing the molecules, are wellknown to those in the art (see, for example, Hopp et al., BioTechnology6:1204, 1988; U.S. Pat. No. 5,011,912; each of which is incorporatedherein by reference).

[0085] Fluorescent reporter markers are particularly convenient for usein the compositions and methods of the invention because they allow theselection of cells containing the expressed reporter protein byfluorescence activated cell sorting (FACS). Similarly, proteins thatconfer antibiotic resistance are particularly useful as selectablemarkers because only cells expressing the antibiotic resistance proteincan survive exposure to the particular antibiotic. Cell surface proteinmarkers, which are expressed on the surface of a eukaryotic cell,represent a large class of proteins suitable for use as reporterproteins in the present invention. The surface marker can be selected,for example, using an antibody specific for the protein, or using aligand (or receptor) that specifically interacts with and binds to thecognate cell surface receptor (or ligand). Cells expressing a cellsurface marker can be isolated, for example, by a panning method, whichutilizes immobilized antibodies (or ligands or receptors) thatselectively bind to the cell surface marker, or by a FACS method, inwhich case the antibody or ligand is fluorescently labeled and,therefore, labels the cell expressing the cell surface marker byspecifically binding to the marker. The cell adhesion molecule, N-CAM,is an example of a cell surface marker useful according to the presentinvention.

[0086] As disclosed herein, a reporter cassette can be operativelylinked to a regulatory cassette, thereby providing a construct usefulfor identifying a transcriptional or translational regulatory elementaccording to a method of the invention. Generally, the term “regulatorycassette” refers to a nucleotide sequence required for transcription ofa reporter cassette. Thus, a regulatory cassette generally includes apromoter element, which can be a minimal promoter or strong promoterdepending on the purpose for which a construct comprising the regulatorycassette is to be used, and can contain additional transcriptionalregulatory elements, provided that the elements of the regulatorycassette do not interfere with the use of a construct comprising theregulatory cassette to identify a regulatory element according to amethod of the invention.

[0087] A regulatory cassette useful in a method of identifying atranscriptional regulatory element, for example, is a nucleotidesequence comprising a minimal promoter element. In addition, theregulatory cassette can contain a sequence that facilitates introductionof an oligonucleotide to be examined for transcriptional activity intothe regulatory cassette in an operatively linked manner. Such a sequencecan be a restriction endonuclease recognition site, recombinaserecognition site, and the like. A minimal promoter is a nucleotidesequence that allows initiation of transcription by RNA polymerase II,and can be up-regulated by operative linkage of a regulatory element,particularly an oligonucleotide transcriptional regulatory elementaccording to the present invention. The regulatory cassette andoperatively linked reporter cassette can be in an isolated form, or canbe contained in a vector.

[0088] A regulatory cassette useful in a method of identifying an IRESelement is a nucleotide sequence comprising a promoter element.Generally, but not necessarily, the promoter in such a regulatoryelement is a strong promoter, and preferably the construct comprisingthe regulatory cassette and operatively linked reporter cassette iscontained in a vector. Since an oligonucleotide to be examined fortranslational regulatory activity must be transcribed, a site forintroducing the oligonucleotide into the regulatory cassette/reportercassette construct is positioned downstream of the transcription startsite and, in one embodiment, is positioned in an intercistronic spacersequence of a dicistronic reporter cassette.

[0089] An oligonucleotide having IRES activity generally is positionedin an intercistronic position, from which it can exert its translationalactivity, and, as disclosed herein, can be at various distances from thetranslation start site of the second cistron. An oligonucleotide to beexamined for IRES activity can be many hundreds of nucleotides from thetranscriptional promoter, which generally is positioned upstream (5′) ofthe first cistron of a dicistronic reporter cassette. As such, it shouldbe recognized that such an oligonucleotide to be examined fortranslational regulatory activity is operatively linked to the secondcistron such that an oligonucleotide having IRES activity can beidentified by its effecting translation of the second cistron.

[0090] A promoter element generally acts as a substrate for RNApolymerase II, in combination with additional protein factors, toinitiate transcription. A variety of promoter sequences are known in theart. Thus, promoters useful in a regulatory cassette as disclosed hereininclude the adenovirus promoter TATA box, an SP1 site (GGGCGG; SEQ IDNO: 4), a minimal enkephalin gene promoter (NEK), an SV40 early minimalpromoter, a TRE/AP-1 element (TGACTCA; SEQ ID NO: 5), an erythroid cellGATA element (GATAGA; SEQ ID NO: 6), a myeloid tumor element NF-κBbinding site (GGGAATTCCCC; SEQ ID NO: 7), a cyclic AMP response element(TGACGTCA; SEQ ID NO: 8), and the like. Because an activetranscriptional promoter can comprise a variety of elements, the presentinvention can involve the use of a regulatory cassette with additionalfeatures so as to preferentially select regulatory oligonucleotideshaving an activity that depends upon the included feature. For example,the regulatory cassette can include a consensus transcription initiatorsequence, or can include a transcription initiator sequence derived froma tissue specific gene, thereby increasing the tissue specificity of theselected regulatory oligonucleotide.

[0091] As disclosed herein, a construct comprising a regulatory cassetteoperatively linked to a reporter cassette is useful for identifyingtranscriptional and translational regulatory elements. In oneembodiment, the construct is contained in a vector, which generally isan expression vector that contains certain components, but otherwise canvary widely in sequence and in functional element content. In general,the vector contains a reporter cassette, which can be a dicistronicreporter cassette, operatively linked to a regulatory cassette, whichcontains a minimal promoter element or a strong promoter element,depending on the specific type of regulatory element that is to beidentified. The vector also can contain sequences that facilitaterecombinant DNA manipulations, including, for example, elements thatallow propagation of the vector in a particular host cell (e.g., abacterial cell, insect cell or mammalian cell), selection of cellscontaining the vector (e.g., antibiotic resistance genes for selectionin bacterial or mammalian cells), and cloning sites for introduction ofreporter genes or the elements to be examined (e.g., restrictionendonuclease sites or recombinase recognition sites).

[0092] Preferably, the regulatory cassette and operatively linkedreporter cassette, which can be monocistronic or dicistronic, arecontained in an expression vector that is characterized, in part, inthat it can integrate into a eukaryotic chromosome. Such a constructprovides the advantage that the activity of an oligonucleotide can beexamined in the context or milieu of the whole eukaryotic chromosome. Achromosome offers unique and complex regulatory features with respect tothe control of gene expression, including translation. As such, it isadvantageous to have a system and method for obtaining regulatoryoligonucleotides that function in the context of a chromosome. Thus, amethod of the invention can be practiced such that integration of theexpression vector into the eukaryotic host cell chromosome occurs,forming a stable construct prior to selection for an expressed reportermolecule. Such a system provides a means to identify a regulatoryelement that effects its activity due, for example, to a conformationalchange in a chromosome such as a nucleosome unwinding or DNA bendingevent.

[0093] A construct comprising a regulatory cassette operatively linkedto a reporter cassette, which can be contained in a vector, can beintegrated into a chromosome by a variety of methods and under a varietyof conditions. Thus, the present invention should not be construed aslimited to the exemplified methods, for example, the use of anintegrating retroviral vector. Shotgun transfection, for example, canresult in stable integration if selection pressure is maintained uponthe transfected cell through several generations of cell division,during which time the transfected nucleic acid construct becomes stablyintegrated into the cell genome. Directional vectors, which canintegrate into a host cell chromosome and form a stable integrant, alsocan be used. These vectors can be based on targeted homologousrecombination, which restricts the site of integration to regions of thechromosome having the homology, and can be based on viral vectors, whichcan randomly associate with the chromosome and form a stable integrant,or can utilize site specific recombination methods and reagents such asa lox-Cre system and the like.

[0094] Shotgun transfections can be accomplished by a variety of wellknown methods, including, for example, electroporation, calciumphosphate mediated transfection, DEAE dextran mediated transfection, abiolistic method, a lipofectin method, and the like. For random shotguntransfections, the culture conditions are maintained for severalgenerations of cell division to ensure that a stable integration hasresulted and, generally, a selective pressure also is applied. A viralvector based integration method also can be used and provides theadvantage that the method is more rapid and establishes a stableintegration by the first generation of cell division. A viral vectorbased integration also provides the advantage that the transfection(infection) can be performed at a low vector:cell ratio, which increasesthe probability of single copy transfection of the cell. A single copyexpression vector in the cell during selection increases the reliabilitythat an observed regulatory activity is due to a particularoligonucleotide, and facilitates isolation of such an oligonucleotides.

[0095] A type C retrovirus viral vector is particularly useful forpracticing a method of the invention. There are a variety of retroviralsystems for infecting cells with genes. The production of recombinantretrovirus particles suitable for the introducing the expression vectorsdescribed herein are well known, and exemplary methods are described byPear et al., Proc. Natl Acad. Sci., USA, 90:8392-8396, 1993; Owens etal., Cancer Res., 58:2020-2028, 1998; and Gerstmayer et al., J. Virol.Meth., 81:71-75, 1999, each of which is incorporated herein byreference. Additional viral vectors suitable for use in the presentinvention include the lentivirus vector described by Chang et al., GeneTher., 6:715-728 (1999); the spleen necrosis virus-derived vectordescribed by Jiang et al., J. Virol., 72:10148-10156 (1998); andadenovirus-based vectors such as is described by Wang et al., Proc.Natl. Acad. Sci. USA, 93:3932-3926 (1996).

[0096] The invention also provides an isolated synthetic regulatoryoligonucleotide having transcriptional or translational regulatoryactivity. Such an oligonucleotide can be used in a variety of geneexpression configurations for regulating control of expression. Asynthetic transcriptional regulatory oligonucleotide, which can beobtained by a method of the invention, can increase (enhance) ordecrease (silence) the level of expression of a recombinant expressionconstruct when operatively linked to a regulatory cassette comprising aminimal or other promoter element. Preferably, the regulatoryoligonucleotide selectively regulates expression in a context specificmanner, including, for example, in a cell or tissue specific manner, orwith respect to a particular promoter or other effector sequencesassociated with a promoter.

[0097] A synthetic translational regulatory oligonucleotide, which canbe obtained using a method of the invention, can increase or decreasethe level of translation of an mRNA containing the oligonucleotide, andcan have IRES activity, thereby allowing cap-independent translation ofthe mRNA. In particular, a translational regulatory oligonucleotide canselectively regulate translation in a context specific manner,depending, for example, on the cell type for expression, the nature ofthe IRES sequence, or the presence of other effector sequences in theexpression construct.

[0098] Accordingly, the present invention provides an isolated synthetictranscriptional or translational regulatory oligonucleotide, which canbe identified using the methods disclosed herein. As used herein, theterm “isolated,” when used in reference to a regulatory oligonucleotide,indicates that the nucleotide sequence is in a form other than the formin which it is found in nature. Thus, an isolated regulatoryoligonucleotide is separated, for example, from a gene in which itnormally can be found in nature, and particularly from a chromosome in acell. It should be recognized, however, that the regulatoryoligonucleotide can comprise additional nucleotide or other sequences,yet still be considered “isolated” provided the construct comprising theregulatory oligonucleotide is not in a form that is found in nature.Thus, the oligonucleotide can be contained within a cloning vector orand expression vector, or can be operatively linked to a secondnucleotide sequence, for example, another regulatory element or anexpressible polynucleotide.

[0099] A regulatory oligonucleotide as disclosed herein also is referredto generally as a synthetic regulatory oligonucleotide, for example, asynthetic IRES. As used herein, the term “synthetic” indicates thatoligonucleotides that can be screened using the disclosed methods can beproduced using routine chemical or biochemical methods of nucleic acidsynthesis. It should be recognized, however, that screening of syntheticrandomized oligonucleotide libraries can identify regulatory elementsthat correspond to portions of nucleotide sequences found in genes innature. Nevertheless, such oligonucleotides generally are present in anisolated form and, therefore, cannot be construed to be products ofnature. As disclosed herein, the methods of the invention can identifypreviously known regulatory element, including, for example, bindingsites for the transcription factors SP1, AP1, NF-κB, CREB, zeste andglucocorticoid receptor (see Tables 1 and 2). It should be recognizedthat such previously known regulatory elements are not considered to bewithin the scope of compositions encompassed within the presentinvention.

[0100] The term “oligonucleotide”, “polynucleotide” or “nucleotidesequence” is used broadly herein to mean a sequence of two or moredeoxyribonucleotides or ribonucleotides that are linked together by aphosphodiester bond. As such, the terms include RNA and DNA, which canbe a gene or a portion thereof, a cDNA, a synthetic polydeoxyribonucleicacid sequence or polyribonucleic acid sequence, or the like, and can besingle stranded or double stranded, as well as a DNA/RNA hybrid.Furthermore, the terms “oligonucleotide”, “polynucleotide” and“nucleotide sequence” include naturally occurring nucleic acidmolecules, which can be isolated from a cell, as well as syntheticmolecules, which can be prepared, for example, by methods of chemicalsynthesis or by enzymatic methods such as by the polymerase chainreaction (PCR).

[0101] Synthetic methods for preparing a nucleotide sequence include,for example, the phosphotriester and phosphodiester methods (see Naranget al., Meth. Enzymol. 68:90, (1979); U.S. Pat. No. 4,356,270, U.S. Pat.No. 4,458,066, U.S. Pat. No. No. 4,416,988, U.S. Pat. No. 4,293,652; andBrown et al., Meth. Enzymol. 68:109, (1979), each of which isincorporated herein by reference). In various embodiments, anoligonucleotide of the invention or a polynucleotide useful in a methodof the invention can contain nucleoside or nucleotide analogs, or abackbone bond other than a phosphodiester bond.

[0102] For convenience of discussion, the term “oligonucleotide”generally is used to refer to a nucleotide sequence that is beingexamined for transcriptional or translational regulatory activity,whereas the term “polynucleotide” or “nucleotide sequence” generallyrefers to a sequence that encodes a peptide or polypeptide, acts as orencodes a desired regulatory element, provides a spacer sequence orcloning site, or the like. It should be recognized, however, that such ause only is for convenience and is not intended to suggest anyparticular length or other physical, chemical, or biologicalcharacteristic of the nucleic acid molecule.

[0103] The nucleotides comprising an oligonucleotide (polynucleotide)generally are naturally occurring deoxyribonucleotides, such as adenine,cytosine, guanine or thymine linked to 2′-deoxyribose, orribonucleotides such as adenine, cytosine, guanine or uracil linked toribose. However, a polynucleotide also can contain nucleotide analogs,including non-naturally occurring synthetic nucleotides or modifiednaturally occurring nucleotides. Such nucleotide analogs are well knownin the art and commercially available, as are polynucleotides containingsuch nucleotide analogs (Lin et al., Nucl. Acids Res. 22:5220-5234(1994); Jellinek et al., Biochemistry 34:11363-11372 (1995); Pagratis etal., Nature Biotechnol. 15:68-73 (1997), each of which is incorporatedherein by reference).

[0104] The covalent bond linking the nucleotides of an oligonucleotideor polynucleotide generally is a phosphodiester bond. However, thecovalent bond also can be any of numerous other bonds, including athiodiester bond, a phosphorothioate bond, a peptide-like bond or anyother bond known to those in the art as useful for linking nucleotidesto produce synthetic polynucleotides (see, for example, Tam et al.,Nucl. Acids Res. 22:977-986 (1994); Ecker and Crooke, BioTechnology13:351360 (1995), each of which is incorporated herein by reference).The incorporation of non-naturally occurring nucleotide analogs or bondslinking the nucleotides or analogs can be particularly useful where thenucleotide sequence is to be exposed to an environment that can containa nucleolytic activity, including, for example, a tissue culture mediumor upon administration to a living subject, since the modifiednucleotide sequences can be less susceptible to degradation.

[0105] A polynucleotide comprising naturally occurring nucleotides andphosphodiester bonds can be chemically synthesized or can be producedusing recombinant DNA methods, using an appropriate polynucleotide as atemplate. In comparison, a polynucleotide comprising nucleotide analogsor covalent bonds other than phosphodiester bonds generally arechemically synthesized, although an enzyme such as T7 polymerase canincorporate certain types of nucleotide analogs into a polynucleotideand, therefore, can be used to produce such a polynucleotiderecombinantly from an appropriate template (Jellinek et al., supra,1995).

[0106] The present invention also provides an expression vector, whichis useful for identifying a transcriptional or translational regulatoryoligonucleotide according to the present invention. A vector useful foridentifying a transcriptional regulatory oligonucleotide generallycontains a reporter cassette, which includes a nucleotide sequenceencoding at least one reporter molecule, and a regulatory cassette,which is operatively linked to the reporter cassette and comprises aminimal promoter element. The construct comprising the regulatory andreporter cassettes also generally contains a site for introducing anoligonucleotide to be examined for transcriptional or translationalregulatory activity into the construct in an operatively linked manner.The reporter cassette generally does not contain a promoter forregulating transcription of the reporter gene, and the regulatorycassette generally is operatively linked to the reporter cassette suchthat expression of the reporter gene is regulated by the regulatorycassette. As such, various regulatory cassettes and reporter cassettesconveniently can be substituted into the vector, as desired. In oneembodiment, the reporter cassette comprises a dicistronic construct,which includes first and second cistrons, which encode two differentreporter molecules. Preferably, the nucleotide sequences encoding thefirst and second reporter molecules are operatively linked by a spacernucleotide sequence that contains an IRES, or contains a site thatfacilitates insertion of an oligonucleotide to be examined for IRESactivity in an operatively linked manner.

[0107] A vector useful for identifying an oligonucleotide having IRESactivity generally contains a dicistronic reporter cassette, whichincludes first and second nucleotide sequences that encode respectivefirst and second reporter proteins, and a regulatory cassetteoperatively linked to the dicistronic reporter cassette. The dicistronicreporter cassette further contains an intervening (intercistronic)spacer nucleotide sequence between the first and second encodingnucleotide sequences; the spacer nucleotide sequence generally containsa sequence that facilitates insertion of an oligonucleotide to beexamined for IRES activity, for example, a cloning site, generally amultiple cloning site comprising one or more unique restriction enzymerecognition sites or a recombinase recognition site to facilitateinsertion of the oligonucleotide sequence. Such a vector is useful foridentifying an IRES by detecting a change in the level of expression ofthe second reporter. As disclosed herein, an IRES also can havetranslational enhancing activity or translation inhibitory activity,which can be conveniently detected using a monocistronic reportercassette and detecting an increased or decreased level of translation,respectively, due to the oligonucleotide comprising the IRES.

[0108] In one embodiment, the expression vector is an integratingexpression vector, which comprises nucleotide sequences that provide ameans for stable integration of the regulatory and reporter cassettesinto a chromosome of a eukaryotic host cell. Sequence elements thatfacilitate stable integration are disclosed herein or otherwise known inthe art. Stable integration is conveniently effected using a retroviralbased expression vector having the elements to facilitate packaging intoan infectious retroviral particle and the elements to facilitate stableintegration. These components can vary widely but, generally, thepackaging elements comprise a truncated gag gene comprising sequencesrequired for retrovirus packaging located within the expression vectornucleotide sequence, and the integration elements which comprise andupstream long terminal repeat (LTR) and downstream LTR elementspositioned at the respective upstream and downstream flank of thepackaging element and the regulatory/reporter cassette elements. Theupstream LTR preferably comprises an immediate early gene promoter, an Rregion, and a U5 region, as are well known in the retroviral andexpression vector arts.

[0109] An integrating expression vector useful for identifying atranscriptional regulatory oligonucleotide generally contains animmediate early gene promoter that is derived from Rous sarcoma virus orcytomegalovirus, and the downstream LTR generally comprises a consensustranscription initiator sequence. Integrating expression vectors such asMESVR/EGFP*/IRESpacPro(ori) (SEQ ID NO: 1) andMESVR/EGFP*/IRESNCAMPro(ori) (SEQ ID NO: 9) as disclosed herein provideexamples of integrating expression vectors useful for identifying atranscriptional regulatory oligonucleotide. However, as will be readilyapparent, the various cassettes in the exemplified vectors can besubstituted with other cassettes encoding, for example, reportermolecules having a desired characteristic, or comprising a desiredpromoter, enhancer, silencer or other regulatory element; or can bemodified to contain a desirable cloning site, for example, bysubstituting a restriction endonuclease recognition site or multiplecloning site with a recombinase recognition site.

[0110] An integrating expression vector useful for identifying anoligonucleotide having IRES activity also generally contains animmediate early gene promoter that is derived from Rous sarcoma virus orcytomegalovirus, and the downstream LTR generally comprises a consensustranscription initiator sequence. An integrating expression vector suchas MESVR/EGFP/ECFP/RSVPro (SEQ ID NO: 109) provides an example of anintegrating expression vector useful for identifying an oligonucleotidehaving IRES activity. As above, however, various modifications andsubstitutions to the exemplified vector readily can be made usingroutine methods and commercially available reagents.

[0111] The present invention also provides a recombinant nucleic acidmolecule comprising a transcriptional or translational regulatoryelement of the invention linked to a second heterologous polynucleotide.The term “second” is used herein in reference to a nucleotide sequenceonly to distinguish it from the nucleotide sequence comprising theregulatory oligonucleotide. The term “heterologous” is used herein in arelative sense to indicate that the second nucleotide sequence is notnormally associated with the oligonucleotide comprising regulatoryelement in nature (where the synthetic regulatory element corresponds toa regulatory element that exists in nature) or, if it is associated withthe regulatory element in nature, is linked to the regulatory elementsuch that the recombinant nucleic acid molecule is different from thecorresponding sequence that exists in nature.

[0112] The second heterologous polynucleotide can be an expressiblepolynucleotide, which can encode an RNA of interest such as an antisenseRNA molecule or a ribosome, or can encode a polypeptide of interest, forexample, a polypeptide to be expressed pursuant to a gene therapyprocedure. Where the heterologous polynucleotide is an expressiblepolynucleotide, it generally is operatively linked to the syntheticregulatory oligonucleotide such that the oligonucleotide can effect itsregulatory activity. The second heterologous polynucleotide also cancomprise or encode one or more additional regulatory element, which canbe known promoter, enhancer, silencer or translational regulatoryelements, including such elements that have been identified according toa method of the invention. A recombinant nucleic acid moleculecomprising such a combination of regulatory elements can be useful forselectively expressing an RNA or polypeptide in a cell, which can beonly one or a few different types of cell or any cell, and can beconstitutively or inducibly expressed at a desired level.

[0113] The second heterologous polynucleotide also can be a vector,which can be a plasmid vector, viral vector or the like. Accordingly,the present invention also provides a vector comprising a regulatoryoligonucleotide of the invention. Insofar as a regulatoryoligonucleotide of the invention can be utilized in a variety ofconfigurations for regulating gene expression or protein translation,the general structure of a vector of the invention requires only that itcontain a regulatory oligonucleotide as disclosed herein. However, thevector also can contain nucleotides sequences that facilitate theintroduction of an expressible polynucleotide or other nucleotidesequence into the vector, particularly such that it is operativelylinked to the regulatory oligonucleotide. The vector also can containother elements commonly contained in a vector, for example, an bacterialorigin or replication, an antibiotic resistance gene for selection inbacteria, or corresponding elements for growing and selecting the vectorin a eukaryotic cell.

[0114] The synthetic regulatory element in a vector can be designed suchthat it can readily be removed from the vector, for example, bytreatment with a restriction endonuclease. Such a characteristicprovides a means for developing a system comprising a vector and aplurality of synthetic regulatory oligonucleotides of the invention, anyof which alone or in combination can be inserted into the vector.Accordingly, the present invention also provides a system, which can bein kit form, that provides one or more regulatory oligonucleotidesequences of the invention.

[0115] A kit of the invention can contain a packaging material, forexample, a container having a regulatory oligonucleotide according tothe invention and a label that indicates uses of the oligonucleotide forregulating transcription or translation of a polynucleotide in anexpression vector or other expression construct. In one embodiment, thesystem, preferably in kit form, provides an integrating expressionvector for use in selecting a regulatory oligonucleotide using a methodas disclosed herein. Such a kit can contain a packaging material, whichcomprises a container having an integrating expression vector and alabel that indicates uses of the vector for selecting oligonucleotidesequences capable of regulatory function.

[0116] Instructions for use of the packaged components also can beincluded in a kit of the invention. Such instructions for use generallyinclude a tangible expression describing the components, for example, aregulatory oligonucleotide, including its concentration and sequencecharacteristics, and can include a method parameter such as the mannerby which the reagent can by utilized for its intended purpose. Thereagents, including the oligonucleotide, which can be contained in avector or operably linked to an expressible polynucleotide, can beprovided in solution, as a liquid dispersion, or as a substantially drypower, for example, in a lyophilized form. The packaging materials canbe any materials customarily utilized in kits or systems, for example,materials that facilitate manipulation of the regulatoryoligonucleotides and, if present, of the vector, which can be anexpression vector. The package can be any type of package, including asolid matrix or material such as glass, plastic (e.g., polyethylene,polypropylene and polycarbonate), paper, foil, or the like, which canhold within fixed limits a reagent such as a regulatory oligonucleotideor vector. Thus, for example, a package can be a bottle, vial, plasticand plastic-foil laminated envelope, or the like container used tocontain a contemplated reagent. The package also can comprise one ormore containers for holding different components of the kit.

[0117] The following examples are intended to illustrate but not limitthe invention.

EXAMPLE 1 Selection of Synthetic Transcriptional Regulatory Elements

[0118] This example describes the preparation of a vector useful forselecting transcriptional regulatory elements and the identification andcharacterization of synthetic transcriptional regulatory elements.

[0119] A promoter element proviral vector library was constructed usingthe retroviral-mediated EGFP/FACS selection strategy for syntheticpromoter elements according to the disclosed methods. A library ofpromoter elements (random 18mers; Ran18) was constructed in the proviralselection vector, which was packaged into retroviral particles in COS1cells. The retroviral particles were harvested and used to infect targetcells, which were then treated for 3 days with puromycin to killuninfected or poorly expressing cells. The surviving cells weresubjected to FACS analysis and the most highly fluorescent cellscollected. Genomic DNA was prepared from these cells and the regulatoryoligonucleotides were recovered by PCR and direct sequencing. Theelements then were religated into the proviral vector for a second roundof selection. Finally the elements were ligated into the pLuc luciferasereporter vector and the activities of the elements was quantitated byluciferase assay.

[0120] Such a method involves the generation of several millionenhancer/promoter cassettes, and testing their transcriptional activityin mammalian cell culture. A library of element cassettes was ligatedimmediately upstream of a minimal promoter unit that contains a TATA boxand an initiator sequence in a selection vector (see below; see, also,FIG. 1). In order to deliver the promoter element library into cells asefficiently as possible, a selection vector was designed based on aretrovirus. The use of a retroviral delivery system has three advantagesover a plasmid based system: 1) the introduction of the constructs intocells by retroviral infection is extremely efficient; 2) on average eachcell receives only one promoter construct; and 3) the introducedconstruct is stably integrated into the cellular genome.

[0121] Production of retroviruses from a proviral vector (packaging) wasachieved by transfecting the proviral vector into cells together withhelper plasmids that encode the packaging functions. In the presentmethod, the promoter element library that was constructed in a proviralvector was packaged into retroviruses by transfection into COS1 cells.These viruses were then used to infect the target cells. Each syntheticpromoter element cassette in the proviral promoter element library waslinked to a reporter cassette that reports on its activity afterintegration into the genome of the target cell. The reporter cassettecontained nucleotide sequences encoding enhanced green fluorescentprotein (EGFP) and puromycin N-acetyltransferase (pac), arranged in adicistronic construct that allows two separate gene products to beexpressed from a single mRNA that is driven by a single promoter. Thisarrangement enabled selection of synthetic promoters using fluorescentactivated cell sorting (FACS) and resistance to puromycin.

[0122] After infection of cells with the retroviral promoter elementlibrary and integration into the genome, each promoter was scored forits transcriptional activity by examining the activity of the reportergene EGFP. Using the retroviral delivery system, each cell generallyreceived only one promoter cassette. After 2 to 3 days of infection bythe retroviruses, uninfected cells were removed by treatment them withpuromycin, then the surviving cells were subjected to FACS analysis andcells having the most active promoters were selected. The level of EGFPexpression in each cell reflects the strength of an individual syntheticpromoter element cassette, such that highly fluorescent cells are likelyto contain highly active promoter elements.

[0123] After multiple rounds of selection using the EGFP/FACS analysis,the promoters were amplified from the cellular genome using thepolymerase chain reaction (PCR) and subjected to automated DNAsequencing to determine the identity of each of the synthetic promoterelements. The activity of the regulatory cassette was confirmed using aluciferase reporter system that is more amenable to quantitation ofpromoter activity levels. To perform this quantitation of promoteractivity, each synthetic promoter/luciferase plasmid was independentlytransfected into the cell line in which the initial selection wasperformed (e.g. Neuro2A neuroblastoma cells) and luciferase activity wasmeasured using standard methods.

[0124] A. Synthetic Promoter Methodology

[0125] A library of synthetic DNA sequences to be tested fortranscriptional regulatory activity was generated and screened asdescribed below. The pool (library) of promoter elements containingrandom sequences or combinations of known motifs was ligated into aproviral selection vector generating a proviral promoter elementlibrary.

[0126] Any of at least three different types of libraries ofoligonucleotides can be prepared and examined according to the disclosedmethods. One type of library consists of random sequences of a givenlength, for example, 18-mers, which are tested for their ability toenhance the activity of a minimal promoter such as a TATA motif and asite for the initiation of transcription. Such a library, which wasexamined as disclosed herein, has the potential to identify novel cisregulatory elements and transcription factors that bind to theseelements.

[0127] A second type of library combines a random oligonucleotidesequence and a known regulatory motif, for example, a TPA responsiveelement (IRE; AP1 binding site). By varying the nature, polarity,number, order and spacing of known regulatory elements and randomoligonucleotide sequences, such a library also can be used to identifynovel cis regulatory elements and transcription factors that bind tothese elements (as above), and further can identify novel promoterelements that modulate the function of known regulatory elements.

[0128] A third type of library combines transcription factor bindingsites already known to function in particular contexts of eukaryoticgene regulation, for example, the binding sites for Krox, paired domain(Pax) and AP-1 (TRE), which are present in naturally occurringneuronally-expressed genes. Such a library can be used to establishrules and constraints that govern functional interactions betweenelements and their associated transcription factors. Construction of thelibrary involves linking several elements together such that the order,number, and spacing of the elements are controlled, for example, thesuccessive element ligation procedure as disclosed herein (see Example1F).

[0129] A key feature of the synthetic transcriptional regulatory elementmethodology of the invention is the strategy for the selection offunctional promoter elements. A screening procedure strategy was devisedthat allows testing of random elements or combinations of elements fortranscriptional modulating activity in mammalian cells. Several keyrequirements necessary for successful selection of synthetictranscriptional regulatory elements in mammalian cells are 1) each cellshould receive a single unique cassette to avoid selection of inactiveelements that happen to be present in the same cell as an activeelement; 2) the synthetic elements should be shielded from the effectsof genomic sequences that may activate or repress transcription; 3) thedelivery system should be efficient so that a complex library can bereadily screened; and 4) the selection process should be stringent andshould be based on a reporter gene assay that is highly sensitive andthat faithfully reports the activity of the promoter elements.

[0130] A library of single stranded oligonucleotides containing eighteenrandomized positions (A, C, G or T at each position) was synthesized onan Applied Biosystems DNA synthesizer. This portion of theoligonucleotide was designated Ran18. Flanking the Ran18 cassette wereshort regions of defined sequence, including recognition sequences forthe restriction enzyme Mlu I, which allowed the cassette to be insertedinto the MESV/IRES/EGFP/pacPro(ori) proviral vector (SEQ ID NO: 1; see,also, FIG. 1).

[0131] To prepare the double stranded Ran18 oligonucleotides, anadditional primer that was complementary to the right flanking portionof the single stranded oligonucleotide was synthesized and annealed tothe Ran18 oligonucleotide. Annealing was performed with equimolaramounts of the flanking primer and the Ran18 oligonucleotide in asolution containing Tris-HCl (pH 7.5) and 1 mm MgCl₂ at 100° C. for 5minutes, followed by slow cooling to room temperature. The second strandwas generated by primer extension using the Klenow fragment of DNApolymerase 1 and 50 mM of dNTPs at 30° C. The double strandedoligonucleotide was purified and digested with Mlu I at 37° C. for 12hr, and was purified by extraction from an 8% polyacrylamide gel.

[0132] After digestion, the library of Ran18 cassettes was ligated intothe proviral vector at a 1:1 molar ratio of oligonucleotide to vector.Typically 0.5 to 2 μg of vector was used in each ligation in a volume of100 ml. DNA was purified using QiaQuick PCR purification columns(Qiagen) and the ligation mixture was used to transform frozenelectrocompetent XL1-Blue E. coli cells (Stratagene) by electroporation.The transformation mix was plated onto 150 mm LB plate containingAmpicillin. Smaller aliquots of the transformation mix were plated onto100 mm plates and colonies were counted to determine the number oftransformants per microgram of vector. Plasmid DNA from the library wasprepared via standard procedures (Qiagen Maxi-plasmid Prep) and the DNAwas transfected into eukaryotic cells for retroviral packaging.

[0133] B. Retroviral Vector Construction

[0134] Retroviruses are extremely useful tools to deliver genes intoeukaryotic cells both in culture and in whole animals. Currently,however, most retroviral vectors are not tailored for tissue specific ordevelopmental stage specific delivery of genes. Thus, a benefit ofscreening a retroviral library for functional synthetic regulatoryelements as disclosed herein is the potential to create novelretroviruses with exquisite target specificity. Such vectors can beextremely useful for generating cell lines or transgenic animals fordiagnostic screening procedures and drug development. In addition, suchvectors can be useful for gene therapy in humans.

[0135] A retrovirus is a single stranded RNA virus that infects a celland integrates into the genome of a cell by copying itself into a doublestranded DNA molecule by reverse transcription. The integratedretrovirus genome is referred to as a provirus. Retroviruses have a twostage life cycle, existing both an RNA and DNA form. The RNA form of thevirus is packaged into an infectious particle that is coated with aglycoprotein that is recognized by receptors on the host cell. Thisinteraction promotes a receptor mediated internalization event,resulting in exceptionally efficient delivery of the viral genome intothe cell. After transport to the cell nucleus and uncoating, the RNAgenome is reverse transcribed into a DNA form (a provirus). During thereverse transcription process, the provirus integrates into the hostcell genome. Retroviruses do not integrate in a completely randomfashion, but instead have a distinct preference for integration intoregions of the genome that are transcriptionally competent. Thischaracteristic reduces the likelihood that the provirus will be silencedby integration into a transcriptionally repressive domain.

[0136] In a recombinant retrovirus, the entire coding region of thevirus is removed and replaced with a transgene. This replacement is doneby standard molecular biological techniques using a proviral version ofthe virus that is propagated as a bacterial plasmid (a pro-retroviralvector). However, other sequences in the retrovirus genome are requiredfor the functions of viral transcription and packaging: these genesencode the viral gag and pol proteins, and the viral glycoprotein coat.While such sequences can be removed from the pro-retroviral plasmid, inorder to obtain a fully functional recombinant virus, they must beprovided in trans, for example, on other plasmids that are introducedinto the host cell via cellular transfection. Alternatively, thesehelper functions can be designed to already be integrated into thecellular genome of the viral packaging line.

[0137] Retroviruses have two viral promoters called long terminalrepeats (LTRs), one located at each end of the viral genome. Theupstream LTR is responsible for promoting transcription of the DNAprovirus into the RNA form. The downstream LTR is not used fortranscription during the RNA phase of the life cycle. However, duringreverse transcription of the RNA into the DNA provirus, the downstreamLTR provides a template for the replication of the upstream LTR. Thus,native retroviruses contain identical sequences in their upstream anddownstream LTRs.

[0138] Nucleotide sequences that encode enhanced green fluorescentprotein (EGFP) and puromycin N-acetyltransferase (pac) were insertedinto a retroviral vector (see below). The two reporter genes areexpressed as a single transcript, and are linked by an internal ribosomeentry sequence (IRES). Expression of both reporter genes is controlledby the same promoter. The upstream LTR was modified to contain a strongpromoter from the Rous sarcoma virus (RSV), thus ensuring efficienttranscription of the RNA viral genome and a high viral titer. Thedownstream LTR was modified to contain a minimal synthetic promoter anda multiple cloning site for insertion of the Ran18 elements. Thedownstream LTR is not used for transcription during the RNA phase of thelifecycle. However, during reverse transcription of the RNA into the DNAprovirus, the downstream LTR provides a template for the replication ofthe upstream LTR From this position, the Ran18/minimal promoter cassettecan drive expression of the reporter genes in the integrated form of thevirus.

[0139] The MESVR/EGFP*/ESpacPro(ori) (SEQ ID NO: 1) was based onMESV/IRESneo (Owens et al., supra, 1998), which, in turn, was based onthe Murine Embryonic Stem cell Virus (MESV) retrovirus (Mooslehner etal., J. Virol., 64:3056-3058, 1990; Rohdewohld et al., J. Virol.,61:336-343, 1987, each of which is incorporated herein by reference).MESV is a C-type retrovirus that was modified to remove sequences thatare necessary for independent replication. Consequently, the virus canonly replicate with the assistance of helper genes that encode theproteins required for viral genome packaging and insertion into the hostgenome.

[0140] Five different insertions were made to produce the finalMESVR/EGFP*/IRESpacPro(ori) vector, which contains 6357 base pairs (SEQID NO: 1). First, a cassette containing a polylinker for the insertionof Ran 18 elements, the adenovirus major late promoter, and theinitiator (Inr) from the mouse terminal deoxynucleotidyl transferasegene and a complete R region were inserted at the downstream U3 region(Lagrange et al., Genes Devel. 12:34-44, 1998; Colgan et al., Proc.Acad. Natl Sci. USA 92:1955-1959, 1995, each of which is incorporatedherein by reference). Second, the U3 region enhancer elements from RSVwere inserted at the upstream LTR. The source of the RSV enhancerelements was the pRc/RSV plasmid (Invitrogen Corp., La Jolla Calif.).Third, mutations to produce a green fluorescent protein (GFP) havingenhanced expression (EGFP) were introduced (Zernicka-Goetz et al.,Development 124:1133-1137, 1997, which is incorporated herein byreference). Fourth, a copy of the puromycin N-acetyltransferase (pac)was inserted downstream of the IRES after excising the neomycinresistance gene. The source of the pac gene was the pPUR plasmid(Clontech, Palo Alto Calif.). Fifth, an SV40 origin of replication wasinserted into the plasmid. The source of the SV40 origin was the plasmidpcDNA3.1 (Invitrogen Corp.). Many of the fragments were generated as PCRproducts from vectors from commercial sources.

[0141] The relevant portion of the retroviral vectorMESVR/EGFP*/IRESpacPro(ori) (SEQ ID NO: 1) is shown in FIG. 1. Asindicated above, it contains a strong enhancer from RSV in the positionof the upstream LTR that drives expression of the RNA viral genome, andcontains a minimal synthetic promoter in the position of the downstreamLTR (FIG. 1). The multiple cloning site upstream of this minimalpromoter permits the insertion of oligonucleotides such as the Ran18elements to generate a library of proviruses, each containing a uniquepromoter cassette in the downstream LTR. The proviral vector library wastransfected into mammalian cells together with helper plasmids requiredfor viral production including a plasmid that encodes the group antigen(gag) and the integrase enzyme (pol) that is packaged with the RNAgenome as well as a plasmid that encodes the glycoprotein coat (VSV-G).

[0142] Retroviruses exist as RNA and DNA forms. The DNA form is referredto as the provirus and must be transcribed to generate the RNA form thatis packaged into an infectious viral particle. The viral particle iscoated with a glycoprotein that is recognized by receptors on the hostcell leading to receptor-mediated internalization. After entry into thecell nucleus, the RNA genome is reverse transcribed into the DNA formwhich is stably integrated into the host cell genome.

[0143] The viral packaging protocol involved a triple transfection intoCos-1 cells of a library containing pro-retroviral vectors that harborthe putative promoter elements together with the two separate plasmidsthat encode the gag/pol and VSV-G proteins, respectively. Cellulartranscription machinery is used to generate the viral RNA strands thatare packaged into viral particles and subsequently bud from the cellmembrane. These viral particles can infect a naive cell as describedabove. After reverse transcription and integration, the strong promoterlocated in the upstream LTR is lost and is replaced by the Ran18/minimal promoter cassette from the downstream LTR. Thus, the virallibrary is fully representative of the original vector library becauseall viral RNAs were transcribed from the same strong promoter. Incontrast, each integrated DNA version of the virus contains a differentRan18 cassette in the upstream LTR, which now drives expression of theselectable markers, EGFP and pac, selection for which indicates thestrength of activity of the promoter cassette.

[0144] Packaging of the proviral vector library was achieved bycotransfection of the proviral DNA into COS1 cells together with thepackaging genes, which are contained on two separate helper plasmids,pCMV-GP(sal) and pMD.G. The pCMV-GP(sal) plasmid has a cytomegaloviruspromoter (pCMV) driving the genes that encode the group antigen (gag)and reverse transcriptase enzyme (pol) from the Moloney murine leukemiavirus (MMLV). The pMD.G plasmid encodes the vesicular stomatitis virus Gglycoprotein (Naldini et al., Science 272(5259):263-267, 1996, which isincorporated herein by reference). These two plasmids were cotransfectedinto COS1 cells along with the library of recombinant retroviral vectorscontaining putative promoter elements in order to generate a library ofretroviruses.

[0145] COS1 cells were seeded into 100 mm dishes at 8×10⁵ cells/dish andtransfected 24 hr later with 4 μg of proviral library DNA and 4 μg ofthe pCMV/gag-pol and pCMV/VSV-G plasmids using Fugene transfectionreagent (Roche). The cellular transcription machinery generates viralRNA strands that are packaged into viral particles and subsequently budfrom the cell membrane into the culture medium. The medium wascollected, diluted with an equal volume of media, filtered to removecellular debris, and combined with polybrene to a final concentration of2.5 mg/ml of viral supernatant. This mixture was used to infect Neuro2Acells in monolayer culture. The ratio of viral particles to cells wasoptimized so as to ensure a high probability of singleinfection/integration events, and generally resulted in infection of2540% of the Neuro2A cells.

[0146] C. Characterization of the Selection Method

[0147] In order to demonstrate the feasibility and efficacy of theretroviral delivery and FACS selection of synthetic regulatory elements,an initial set of experiments was performed in which proviral plasmidswere prepared containing the minimal promoter, Pmin, alone; a minimalpromoter containing three copies of the TRE/AP-1 element (3×TRE); or afull strength RSV promoter. The latter two regulatory elements wereexpected to drive expression of the EGFP gene at a high level, whereasthe minimal promoter represents the baseline activity.

[0148] Actively infecting retroviruses were prepared for each of thesethree promoter constructs by carrying out a triple transfection of amonolayer of actively dividing COS-1 cells with two helper plasmidsencoding genes that are essential for the propagation of active virus.The culture media containing fully active viral particles correspondingto each of the three promoters was collected and used to infect thetarget neuroblastoma cell line, Neuro2A. These cells were selected forthis study because they grow quickly, are relatively non-adherent, havea high transfection efficiency, and are efficiently infected using theretroviral vector.

[0149] To establish the maximal and minimal values of promoter activityobtainable using this EGFP/FACS selection procedure, several controlexperiments were performed using the very strong (RSV), moderatelystrong (3×TRE), and minimal (Pmin) promoters. These experiments wereperformed in order to determine the optimal gating of cells so that onlyhighly active Ran18 elements would be assayed. Neuro2A cells infectedwith the retrovirus in which the EGFP reporter was driven by the strongRSV promoter showed a high level of EGFP fluorescence, and the cellsinfected with the 3×TRE retrovirus showed an intermediate level offluorescence. For each of the RSV and TRE-containing retroviruses, thenumber of highly fluorescent cells was considered to be equivalent tothe number of infected cells. Thus, approximately 30% of the cells wereinfected by the retroviruses. In addition to the positive controls, asecond negative control population of cells was infected with aretrovirus containing only the minimal promoter (TATA box). ThePmin-containing retrovirus showed only background levels ofautofluorescence, thus providing a baseline for the level of EGFPexpression that is produced by the minimal promoter in the absence of anenhancer. These results demonstrate that Neuro2A cells can beefficiently infected with the promoter-containing retroviruses and thatthe EGFP fluorescence is sufficiently strong in order to select activepromoters from inactive or weak promoters.

[0150] D. Selection of Synthetic Transcriptional Regulatory Elements

[0151] A library of synthetic oligonucleotides, each containing a randomsequence of eighteen base pairs (Ran18) to be examined fortranscriptional regulatory activity was ligated into the Mlu Irestriction site immediately upstream of the minimal promoter in theproviral selection vector, generating a library of greater than 5×10⁷individual members. This Ran18 promoter element library was packagedinto retroviral particles, which were used to infect the neuroblastomacell line Neuro2A. After 24 hours, 1 mg/ml puromycin was added to theinfected cells, and treatment with puromycin was continued for 3 days tokill uninfected cells. Surviving cells were sorted using a FACSTARfluorescence activated cell sorter (Becton Dickinson). Control cellswere infected with a reporter retrovirus containing either the minimalpromoter (Pmin) or a strong promoter (RSV) to drive expression of theEGFP reporter gene. The Pmin control provides a baseline for the levelof EGFP expression that is produced by the minimal promoter in theabsence of an enhancer. The RSV control provides a measure of infectionefficiency.

[0152] The fluorescence profile of cells infected with the Ran18 librarywas compared with that of the Pmin promoter control to determine thefluorescence threshold for promoter element selection. Approximately 1%of the cells showed greater fluorescence than that observed for theminimal promoter alone. Given a viral infectivity of about 33% based onexpression for the RSV promoter, about 3% of the elements in the Ran18promoter element library enhanced the activity of the minimal promoter.

[0153] The most highly fluorescent cells were collected and genomic DNAwas extracted using the QiaAmp Tissue Kit (Qiagen). The Ran18 cassetteswere recovered from the genomic DNA by PCR amplification using primersthat flank the Ran18 promoter cassette. The amplified promoters weredigested with Nsi I and Bgl II to liberate the Ran18 element cassettes,which then were religated into the proviral selection vector to producea second generation library, and the EGFP/FACS selection procedure wasrepeated.

[0154] Following the second round of EGFP/FACS mediated selection, Ran18promoter element cassettes were again recovered by genomic PCR Theamplified promoter cassettes were digested with Nsi I and Eco RI togenerate a fragment that includes the Ran18 cassette and the minimalpromoter, and the liberated fragments were ligated into a promoter-lessluciferase reporter vector (pLuc) to generate Ran18/promoter/pLucplasmids. The pLuc plasmid was made by introducing a polylinkercontaining restriction endonuclease sites for Nsi I, Stu I and Eco RIinto the Kpn I/Hind III site of the luciferase reporter plasmidPGL3basic (Promega). Following bacterial transformation, individualsubclones were isolated, 300 ng was subjected to automated DNAsequencing using an automated DNA sequencer (Perkin-Elmer AppliedBiosystems 373 sequencer) to determine the identity of each functionalRan18 promoter element, then the sequences were compared to databases ofknown regulatory motifs (Transfac and TFD databases).

[0155] Two salient features were noted in the sequences of the Ran18elements selected after two rounds of EGFP/FACS selection. First, as aresult of the non-directional cloning strategy, most of the elementscontained multiple copies (generally two) of the Ran18 sequences.Comparison of the selected elements with a set of Ran18 elements thatwere ligated into the same Mlu I restriction site in the proviralvector, but not subjected to EGFP/FACS based selection, indicated thatthe proportion of multimerized elements was significantly increased inthe selected set (70% in the selected set compared to 24% in theunselected set). Second, a large number of the selected Ran18 sequencescontained binding sites for known transcription factors, includingc-Ets-2, glucocorticoid receptor (GR), E2F-1, Sp1, AP1, kY factor, CP1,TFIID, PTF-1β, DTF-1, AP2, PEA3, TBP, NF-1, UCRF-L, F-ACT1, CTF, ETF,GATA-1, c-Myc, E2F-1, C/EBPα, lk2, GATA, and ΔEF1. However, several ofthe selected Ran18 elements contained no known binding motifs and appearto be novel transcriptional regulatory sequences (SEQ ID NOS: 10, 11 and13 to 15).

[0156] The transcriptional activity of individual Ran18 promoterelements was quantified by luciferase assays after transienttransfection of the Ran18/pLuc subclones into Neuro2A cells. EachRan18/pLuc reporter vector was co-transfected with the control plasmidCMVβgal, which encodes β-galactosidase, to normalize for transfectionefficiency. A pLuc reporter vector containing only the minimal promoterunit was used to provide a baseline for the activity of the minimalpromoter. Two hundred Ran18/pLuc subclones containing selected Ran18elements were analyzed by transient transfection and luciferase assay.Approximately 25% of these plasmids produced luciferase activity thatwas greater than 4-fold above that produced by the minimal promoter,with the highest level of activity being 17-fold above that of theminimal promoter. In contrast, only about 1% of the elements of acomparable set of unselected Ran18 elements had activity greater than4-fold above that the minimal promoter.

[0157] E. Characterization and Uses of Synthetic TranscriptionalRegulatory Elements

[0158] The selected transcriptional regulatory elements can be examinedin a variety of ways, including 1) the level of transcriptional activityproduced by each element can be determined using luciferase assays, 2)novel sequences within the element can be multimerized and used as baitin either yeast one-hybrid screening assay or a southwestern screeningprocedure to isolate potentially novel transcription factors to whichthe elements bind, 3) activity of the elements can be compared indifferent cell types or cellular environments such as in the presence ofgrowth factor treatment to identify elements that function in onecontext but not the other and, therefore, can be useful as a fingerprintfor a particular cell type or cellular state, and 4) functional elementscan be recombined to examine the rules and constraints governingfunctional interactions between cis-acting regulatory sequences. Inaddition, recombination of the elements can produce new elements thatcombine the benefits of particular individual elements such as strengthor cell-type specificity.

[0159] A database was created containing the functional Ran18 elementsobtained in the above selection procedure, and elements were categorizedinto those that contained sequences that bind to known transcriptionfactors and those that contained completely novel sequences. Inaddition, these functional elements were compared to each other todetermine the frequency of particular sequence motifs, which reflectsthe relative abundance of specific transcription factors present in thecells used in the selection process. This promoter element database canbe compared to lists of elements that are selected in different celllines, or in the same cell population that is treated with a differentgrowth factor or drug (see below), thus extending the disclosedselection process to identify Ran18 elements or other regulatoryoligonucleotides that function in different cellular environments, forexample, in different cell types or in proliferating versusdifferentiating cells, to determine differences and similarities in thesets of transcriptional regulatory elements that function during theseprocesses.

[0160] Active oligonucleotide regulatory elements such as theexemplified Ran18 elements also can be selected for combinatorialanalysis by ligating them together using a method such as the selectiveelement ligation procedure (see Example 1F). Once combinations offunctional elements are prepared, the synthetic promoter selectionprocedure is performed on this combinatorial element library. Theidentified functional promoter elements then are used in DNA/proteinbinding studies to characterize the transcriptional regulatory proteinsto which these elements bind and to identify novel transcriptionfactors. The southwestern screening procedure (Vinson et al., GenesDevel., 2:801 1988; Singh et al., Cell, 52:415-423, 1988) or the yeastone hybrid technique (Wang et al., Nature, 364:121-126, 1993; Li et al.,Science, 262:1870-1874, 1993; Dowell et al., Science, 265:1243-1246,1994) can be used for these studies. In addition, characterization ofthe binding properties of selected elements can be carried out using anelectrophoretic mobility shift assay (EMSA).

[0161] The ability of cellular proteins to specifically interact withthree selected Ran18 elements, S131 (SEQ ID NO: 16), which contains AP1,SP1, CP1, ETF and c-Ets-2 binding motifs; S133 (SEQ ID NO: 12), whichcontain an SP1 binding motif; and S146 (SEQ ID NO: 17), which containsC/EBPα, GR, and PR binding motifs, was examined. The Ran18 elements wereradiolabelled and combined with nuclear extracts from the Neuro2Aneuroblastoma cells or from 3T3 fibroblasts, then the resultingDNA-protein complexes were examined by EMSA. Both cell type-specific andubiquitous complexes were observed. The S131 and S133 elements bothcontained Sp1 binding sites, and an Sp1 competitor oligonucleotide,which corresponds to the sequence of an Sp1 binding site, competed forsome or all of the complexes formed with these probes. Similarly,element S146, which contains a glucocorticoid response element, formedone complex that was disrupted by incubation with a specific GRcompetitor, as well as additional complexes that were not disrupted bythe GR competitor. These results demonstrate that the selected Ran18elements can specifically interact with nuclear proteins, including withnuclear proteins only expressed in Neuro2A cells.

[0162] The promoter selection techniques disclosed herein can be readilyapplied for use in disease diagnostic procedures by identifyingregulatory elements that are highly active only in specific cell typesor cellular contexts. A library of random promoter elements is screenedfor transcriptional activity in cell lines derived from severaldifferent tissue types or from cells that are subjected to a particulartreatment, for example, treatment with a growth and differentiationfactor such as the TGF-β family growth factor, bone morphogenicfactor-4, with signaling molecules or with antiproliferative agents.Regulatory elements that are highly active in these different contextsare sequenced and used to create a “transcriptional element profile” forthe cell type or cellular response.

[0163] The synthetic promoters also can be used as markers for disease.Many disease states are characterized by aberrant regulation oftranscription, often affecting multiple genes. The synthetic promoterselection strategy is used to rapidly identify promoters that showelevated levels of expression in a specific disease state. Thesepromoters are then linked to a reporter gene such as EGFP and integratedinto cultured mammalian cells to create a battery of cell lines thatmodel the aberrant transcriptional regulation associated with thedisease. Candidate drug treatments can be tested for the ability toalter the activities of these promoters. In a simple model, a panel ofdrugs can be screened and a drug can be identified that reduces theactivity, for example, of 10 out of 12 synthetic promoters whoseactivity is correlated with the disease. As such, the drug is identifiedas likely to be targeting a common factor or pathway involved in theactivation of each of these promoters. The reporter constructs also canbe integrated into transgenic mice such that the expression of EGFPprovides a dynamic reporter system that allows the effectiveness oftherapeutic agents to be monitored over the course of treatment.

[0164] Synthetic promoters that regulate cell specific expression can beused for cell specific expression of a therapeutic gene product inpatients using a retroviral mediated gene therapy procedure. Forexample, a pro-apoptotic agent such as the Bax gene product can beexpressed under the control of a synthetic promoter that was selectedbased on its ability to function only in glioma cells, but not in normalcells, such that expression of the Bax gene only occurs in the gliomacells and selectively kills the glioma cells.

[0165] Thus, by selecting elements in different cellular environments,such as those representing normal and diseased states, a set ofsynthetic promoters can be identified that are responsive (i.e., havetranscriptional competence in a particular cellular context), therebyproviding a means to diagnose a disease state. A population of suchelements can be used, for example, as an array to fingerprint aparticular disease phenotype. For instance, the growth patterns andresponsiveness of specific tumor cells to various hormones, cytokines,and synthetic agonists or antagonists of these molecules can be probedby determining the regulatory elements and associated transcriptionalproteins that are utilized in particular tumor cells. In addition to thepotential utility of the promoter selection procedure for diseasediagnostics, the method can be useful for constructing syntheticpromoters for tissue specific or cellular state specific delivery oftransgenes, for example, for gene therapy in humans, or fordevelopmental and gene replacement studies in animals.

[0166] F. Successive Element Ligation Procedure

[0167] The successive element ligation procedure provides a method forproducing multimers of individual regulatory elements into largercassettes, thus providing a means to generate combinations of particularregulatory elements that lead to a desired pattern or level ofexpression of an operatively linked polynucleotide. The proceduregenerally provides a means to randomly link individual transcriptionalor translational regulatory elements into cassettes using successiveunidirectional ligation to a DNA adaptor immobilized on a solid support,for example, paramagnetic particles coated with streptavidin.

[0168] Individual regulatory elements are designed to contain CTCT andGAGA overhangs (or other selected anti-complementary sequences) on the“top” and “bottom” strands, respectively. An adaptor oligonucleotide,containing a biotin group at its 5′ end is annealed to a bottom strandoligonucleotide, which contains the 5′ overhang sequence, GAGA. Theresulting duplex adaptor contains an Nsi I restriction site, whichallows cleavage of the multimerized cassette at the end of theprocedure. A biotin tagged adaptor is then attached to streptavidinbeads and phosphorylated, thereby enabling the ligation of the firstregulatory element to the immobilized adaptor complex; the first elementcontains a donor 5′ overhang sequence, CTCT, that is compatible with therecipient GAGA of the adaptor. After ligation of the first element tothe immobilized adaptor, the phosphorylation reaction is repeated andthe first element is now ready to accept ligation of a second element.This procedure is reiterated to generate a growing chain of regulatoryelements. Once a cassette of a given length is synthesized, a cappingadaptor oligonucleotide containing an Mlu I restriction site is ligated,terminating the synthesis of elements. The cassettes produced by thisprocedure are then amplified by PCR, digested with Nsi I and Bgl II toremove the capping adaptors and biotin, and cloned into the Nsi I andBgl II sites of the proviral promoter selection vector. Thecombinatorial proviral promoter library is screened to select effectiveregulatory element combinations as described above.

[0169] For the ligation procedure, streptavidin MagneSphere ParamagneticParticles (Promega, Madison Wis.) are washed three times with 0.5×SSC,capturing the beads using a magnetic stand each time between washes. Thebeads are then resuspended in 100 μl of 0.5×SSC and 200 pmol of anadaptor oligonucleotide, which contains a biotin group on the 5′ end, isattached to the beads through the streptavidin-biotin interaction. Theadaptor also contains an Nsi I restriction enzyme cleavage site to clonethe cassette following its synthesis. The bound adaptor then isphosphorylated using 300 pmol of ATP and 100 units of polynucleotidekinase in preparation for ligation with individual elements. Pools ofelements in equimolar amounts (3 mM each, 30 mM total) are ligated ontothe adaptor using 5 units of T4 DNA ligase. The oligonucleotidesencoding these elements all contain compatible overhangs of GAGA on the5′ end and CTCT on the 3′ end to facilitate assembly. Between enzymaticmanipulations, the beads are washed 3 times with 0.5×SSC and once withthe reaction buffer of the next step. This step is reiterated togenerate the desired cassette length. Finally, a cappingoligonucleotide, which contains a Bgl II site, is ligated onto theassembled element cassette. This oligonucleotide in combination with theadaptor is used to facilitate cassette amplification via PCR Theamplified products are then digested with Nsi I and Bgl II and clonedinto the proviral selection vector, and combinations of regulatoryelements having desirable characteristics can be selected.

EXAMPLE 2 Validation of Synthetic Regulatory Element Selection Method

[0170] This example demonstrates the disclosed synthetic regulatoryelement selection method can be used routinely to screen libraries ofoligonucleotides and can consistently identify synthetic transcriptionalregulatory elements.

[0171] The retrovirus vector MESVR/EGFP*/IRES/pacPro(ori) (SEQ ID NO: 1;see Example 1B) was used to screen a second library of Ran 18 sequencesusing the synthetic promoter construction method (SPCM). More than 100DNA sequences that showed increased promoter activity (4 to 50-fold) inthe neuroblastoma cell line Neuro2A were identified. The DNA sequencesof selected synthetic promoters were determined and database searchusing the RIGHT software package, which allowed simultaneous comparisonof a database of active Ran 18 elements to existing databases such asTransFac. The search revealed a predominance of eight motifs—AP2, CEBP,GRE, Ebox, ETS, CREB, AP1, and SP1/MAZ; about 5 to 10% of the active DNAsequences were not represented in known transcription factor databasesand appeared to be novel. The most active of the selected syntheticpromoters contained composites of pairs, triples, or quadruples of thesemotifs. Assays of DNA binding and promoter activity of three exemplarymotifs (ETS, CREB, and SP1/MAZ) confirmed the effectiveness of SPCM inidentifying functional transcriptional regulatory elements.

[0172] Methods and reagents were essentially as described in Example 1.Ran18 oligonucleotides were constructed using a PE Biosystems DNAsynthesizer. Ran18 elements were flanked by two different sequences(left—ctactcacgcgtgatcca, SEQ ID NO: 18; and right—cggcgaacgcgtgcaatg,SEQ ID NO: 19) containing the Mlu I restriction site that allowedcloning into the selection vector. Double stranded Ran18 sequences weregenerated by primer extension, digested with Mlu I, and purified byextraction from an 8% polyacrylamide gel. The library of Ran18 sequenceswas ligated into the MESVR/EGFP*/IRES/pacPro(ori) (SEQ ID NO: 1)retroviral vector and transformed into XL1-Blue E. coli (Stratagene, SanDiego, Calif.). Plasmid DNA was prepared using Maxi-Prep columns(Qiagen, Valencia, Calif.). Packaging was achieved by co-transfection ofthe proviral DNA library into COS1 cells together with the helperplasmids, pCMV-GP(sal) and pMD.G (see Example 1).

[0173] Three 100 mm dishes of COS1 cells (8×10⁵ cells/dish) weretransfected with 4 μg of proviral library DNA, 4 μg of the pCMV/gag-polplasmid and 2 μg of the pCMV/VSV-G plasmid using FuGENE 6 transfectionreagent (Roche). Media were changed 24 hr later, and supernatantcontaining retroviral particles was collected after an additional 24 hr,filtered, and combined with polybrene to a final concentration of 2.5μg/ml. This mixture was used to infect Neuro2A cells in monolayerculture. The ratio of viral particles to cells was optimized to ensure ahigh probability of single infection/integration events; this ratiogenerally resulted in infection of 25 to 40% of the Neuro2A cells. Afterretroviral infection, each cell incorporated on average a singleintegrated DNA provirus containing a different Ran18 element upstream ofthe minimal promoter and the selectable markers, EGFP and pac.Identification of active Ran18 promoter elements involved two selectionsteps (see Example 1).

[0174] To quantify the activity of Ran18 elements, the Ran18/pLucProplasmids were transfected into Neuro2A cells in 24 well tissue cultureplates. One hundred nanograms of each reporter was transfected togetherwith CMVβgal to normalize for transfection efficiency and 48 hr laterthe cells were harvested and assayed for β-galactosidase and luciferaseactivity (Example 1). The activity of pLucPro was used as a referencestandard for measuring the levels of luciferase activity generated byselected Ran18/promoters.

[0175] Ran18 elements were sequenced using an automated DNA sequencer(Model 373, PE Biosystems, Foster City, Calif.). Sequences were searchedfor candidate transcription factor binding motifs present in theTransFac database (release 3.5) using the RIGHT (Reeke's InteractiveGene Hacking Tool) software package. RIGHT is a motif recognitionprogram based on a regular expression search and is particularly usefulfor SPCM because it allows a batch format for sequence input and has thecapacity to simultaneously analyze large numbers of Ran18 promotersequences. The unselected (“U”) Ran18 elements showed a Gaussiandistribution with a mean activity of 2-fold and a standard deviation of0.8. Using this distribution for the activities of the U Ran18 elementsand allowing for a confidence interval of 98%, it was determined that4-fold activity above that of the minimal promoter represented astatistically significant level.

[0176] Analysis of the distribution of activities of the 480 selectedelements (“S”), superimposed upon the normal distribution from U Ran18sequences revealed that 120 of the selected (S) Ran18 sequences(approximately 25%) had activity that was 4 to 50-fold greater than thatof the minimal promoter. In comparison, only one sequence from the URan18 sequence (less than 1% of the total) showed greater than 4-foldactivity. Thus, SPCM provided approximately 25-fold enrichment of activepromoter elements. A group of S Ran18 sequences that was highly activein luciferase assays also was examined and is referred to at the SLA(selected luciferase activator) Ran18 elements.

[0177] The DNA sequences of 106 SLA, 133 S, and 132 U Ran18 elementswere determined and compared to known motifs within the TransFacdatabase. Only motifs having 100% sequence identity with TransFac motifswith a length of 6 base pairs or greater were scored as matches. Knownregulatory motifs were identified in each of the three sets, but theprevalence and linear arrangement of particular motifs differed amongthe sets.

[0178] Twenty of the most active Ran18 sequences from the SLA set showed78 matches with known motifs (Table 1; SEQ ID NOS: 20 to 39). Asignificant number of these matches occurred as composites consisting oftwo or more motifs that either were overlapping or contiguous. The twomost active elements, MS44 (SEQ ID NO: 20) and S173 (SEQ ID NO: 21),registered 6 and 5 matches, respectively, with known motifs andcontained a composite made up of ETS, AP1, CREB, and GATA motifs. Theseresults indicate that a composite motif arrangement can contributesignificantly to the high level of activity produced by these syntheticpromoters.

[0179] An analysis of the complete SLA, S, and U sets was performed tocompare the number of matches, the distribution of motifs, and thenumber and type of composite elements. Overall, the SLA and S setscontained approximately twice as many motifs as the U set A significantproportion of the motifs identified in all three sets (46% for U, 46.5%for S, and 51% for SLA) were made up of only eight motifs, whichrepresented putative binding sites for eight different families oftranscriptional regulators—AP2, CEBP, GRE, E-box, ETS, CREB, AP1, andSP1/MAZ. The SLA and S sets also contained approximately twice as manyof these motifs as the U set. A comparison of the occurrences of each ofthe 8 most frequent motifs among the three sets revealed a significantincrease in the number of Ebox, ETS, CREB, AP1, and SP1/MAZ motifs inSLA and S sets as compared to the U set, but no significant increase inthe number of GRE and CEBP motifs.

[0180] The total number of composites increased approximately 2.8-foldin both the SLA and S sets over the number found in the U set.Composites were further categorized into three types: category A,including those containing two or more of the 8 most common motifs;category B, including those containing one of the 8 common motifs and amotif other than one of the 8 common motifs; and category C, includingthose containing that two or more motifs other than the 8 most frequentTABLE 1 RANDOMER SEQUENCE MOTIF ACTIV. MS44cgctcgCCTGTCCGCCGCACTTGTtggatcacgcgtgatccaCCAGGAA SP1, EBOX, ETS, TRE,CREB, GATA 56 GTGACGTATCAcgagcg (20) S173cgctcgCAACTCTTTCCCCCCCCCtggaccacgcgtgatccaCCAGGAA MAZ, ETS, TRE, CREB,GATA 48 GTGACGTATCAcgagcg (21) MS72gatccaGGGAGGGGTAGGGTCTATcgagcgacgcgtcgctcgTCTCCTCTA MAZ/SP1, EA1, GATA,ETS, MAZ, ETS, GRE, CACCCGCTGtggatcacgcgtcgctcgTTGCCCTCCCCTTCCTCAtggatcacgcgtcgctcgCTGTC SP1, P300CCCGCCCCACTCCtggatc (22) MS143gatccaAGAGCGGGCAGGGATTGGcgagcgacgcgtcgtcgctcgTCCC UPA, CEBP, SP1, GRE,IE1, ETS 43 GCCCCCTCTATGCTtggatcacgcgtcgctcgTCC TCTTCTTTCCTTCCCtggatc(23) MS115 cgctcgGCCCCGCCCTCTTCCCCCtggatc (24) SP1, GRE 39 S107cgctcgCTCTTGTGTACCTCTCCTtggatcacgcgtcgctcgCCATCTT HES, ETS, CF1, GRE,YY1 24 CTGTCGCTGCtggatc (25) MS91 cgctcgTCTCTTCTCGCCCCCCCCtggatc (26)GRE, AP2 22 MS137 cgctcgCCCCTCCCCTAAGCGCGTtggatcacgcgtgatccaACGGGCA MAZ,TBF, myb, ECR 16 ATGAAACGAATcgagcg (27) S125cgctcgCTGGCCCCGCCCTTAGTTtggatcacgcgtcgctcgACCCCGC SP1, SRY, GATA 15CTTTCGTATCTtggatc (28) MS165cgctcgTCGCCTGGGTTCTGCTACtggatcacgcgtgatccaGAAGAGC AP2, CP2, GRE, SP1 12GGAAGGAGGGAcgagcg (29) MS144 cgctcgCCTTCCCTTACTTCACGCtggatc (30)CEBP/CREB 12 MS19 cgctcgCCTCACGCGAATTCCCCCtggatcacgcgtgatccaGAGAAGGNFKB, MAZ/SP1 11 GAGGGGGGGAcgagcg (31) MS113gatccaGGGGCAAAAAGGGAGGGGcgagcg (32) MAZ/SP1 10 MS25gatccaGGTGGGGCTAGTGACGTGcgagcg (33) EBOX, SP1, CREB 10 S153gatccaGATAGACGGGAGTGAAAAcgagcgacgcgtgatccaAGCGGA GATA, SIF1, P300, SP1,CREB 9 GGAGGGATGTGAcgagcg (34) S158gatccaATCAAGGAGGAGGGATAGcgagcgacgcgtcgctcgTTTCCGG PBX, SP1, GATA, ETS,HNF5 9 TCTTATGTTTGtggatc (35) 5185cgctcgCCCCCCGCCCTCTTTGCCtggatcacgcgtgatccaGGTGGG SP1, EBOX, SP1, CREB 9GCTAGTGACGTGcgacgc (36) MS123 gatccaGAAAAGTGAGGGGAGGGGcgagcg (37) TRE,MAZ/SP1 9 MS77 gatccaGGGACAGTGAGGGGGGGAcgagcgacgcgttgctcgTCCATTT GRE,MAZ, CF1, E2F, KROX 8 CACGCCCCCGCtggatc (38) MS135gatccaACTGGAGAGTAACGCCCTcgagcg (39) EBOX, TRE, SP1 8

[0181] motifs. A comparison of these three categories over the threesets of synthetic promoters revealed a dramatic increase in the numberof category A composites in the SLA and S sets (3 and 5.7-fold,respectively) over that observed in the U set as well as in category Bcomposites (2.7-fold for SLA and S sets). Category C composites alsoincreased in the S set as compared to the U set (about 2.4-fold), butonly increased 1.4-fold in the SLA set. These analyses indicate thatcomposites containing one or more of the 8 frequent motifs correlatefavorably with highly active synthetic promoters.

[0182] The number of composites containing each of the 8 frequent motifsalso was determined. In synthetic promoters of the SLA and S sets, ascompared to the U set, the number of composites containing GRE, Ebox,AP1, CREB, and SP1/MAZ motifs increased dramatically and thosecontaining ETS increased moderately. However, no increase was observedin the number of composites containing AP2 and CEBP elements. Takentogether with the results described above showing the increased Ebox,CREB, API, and SP1/MAZ motifs in the SLA and S sets, these resultsdemonstrate that 1) increases in both number and presence in compositesof the E-box, AP1, ETS, CREB, and SP1/MAZ were correlated with activesynthetic promoters; 2) an increase in the occurrence of GRE elements incomposites but not in their abundance were correlated with activesynthetic promoters: and 3) there was no correlation between either thenumber or the presence in composites of AP2 and CEBP elements withactivity of synthetic promoters. Of the active Ran18 sequences from theSLA and S sets, 4% and 11%, respectively, showed no matches to knowntranscriptional regulatory motifs. As such, these sequences representnovel regulatory elements.

[0183] To determine whether some of the 8 most frequent motifsidentified within the Rant 8 sequences actually contributed to DNAbinding and promoter activity, gel mobility-shift and promoter assayswere performed on native and mutated versions of the ETS, CREB, andMAZ/SP1 motifs in the synthetic promoters MS44 (SEQ ID NO: 20) and MS113(SEQ ID NO: 32; see Table 1). The right hand element found in MS44(designated MS44B) and the Ran18 element in MS113 were examined forbinding to Neuro2A nuclear extracts. MS44B contains an ETS/CREBcomposite and MS113 contains a MAZ/SP1 motif.

[0184] Gel mobility-shift experiments using the MS448 probe revealedhigh and low molecular weight DNA/protein complexes. Formation of highand low molecular weight complexes was eliminated in ³²P-labeledvariants of the MS448 sequence, AC and ΔE, which have multiple base pairsubstitutions in the CREB and ETS motifs, respectively. A probe havingboth ETS and CREB mutations (ΔEΔC) showed no binding to proteins innuclear extracts of Neuro2A cells. Experiments that included these andmutated versions of these motifs as cold competitors in bindingreactions provided similar results. These results indicate that theproteins involved in the higher and lower molecular weight complexesrepresent members of the CREB and ETS families of proteins,respectively. ETS and CREB mutations in MS446 also resulted insubstantial reductions of MS448 promoter activity. Luciferase reportervariants of MS448 with mutations in the ETS, the CREB, or in both ETSand CREB motifs had only 27%, 5%, and 3%, respectively, of the promoteractivity of MS44B.

[0185] Similar binding and activity assays were performed to investigatethe efficacy of the SP1/MAZ motif in the MS113 (SEQ ID NO: 32; Table 1).Mutation of the SP1/MAZ motif resulted in a complete elimination of DNAbinding of Neuro2A nuclear proteins to the MS113 element. A variant ofthe MS113 synthetic promoter containing these SP1/MAZ mutations showedonly 18% of the promoter activity of MS113. Collectively, theseexperiments indicate that the ETS/CREB composite and SP1/MAZ motifsidentified in searches of the TransFac database with the RIGHT softwareare major contributors to both the binding and activity of the syntheticpromoters in which they were found.

[0186] SPCM was designed to address several problems confronted inanalyzing the complex machinery of eukaryotic gene transcription. Abasic problem is to survey the types and frequencies of DNA motifs thatcontribute to promoter activity. As such, it is important to understandwhich combinations of cis and trans elements work in concert with a corepromoter and the basic transcription machinery in a given cellularcontext. The present results demonstrate that the disclosed methods canbe used to identify functional motifs active in the context of a cell,including in various cell types, under a variety of conditions, and invarious combinations.

[0187] After GFP selection of 480 sequences, 120 had greater than 4-foldactivity over that of the minimal promoter in luciferase assays. TheRIGHT software package was used to analyze the occurrence of variousmotifs in three different sets of synthetic promoters: unselected (the Uset), those selected by GFP fluorescence to have promoter activity asintegrants in the genome (the S set), and GFP-selected syntheticpromoters that, as measured after cellular transfection, gave highlevels of activity in an episomal state with the luciferase assay (theSLA set). Approximately twice as many matches with known transcriptionalregulatory motifs were found in the SLA and S sets than were found inthe U set. Fifty-one percent of the matches were with eight differentmotifs—AP2, CEBP, GRE, Ebox, ETS, CREB, AP1, and SP1/MAZ, and the mostactive sequences were made up of composites of these eight motifs,including the two most active sequences, both of which containedoverlapping ETS and CRE motifs. A BLAST search for occurrence of thiscomposite in natural promoters revealed an exact match with an elementin the proximal promoter of a gene encoding a non-structural proteinfrom the parvovirus B19 (Zakrzewska et al., GenBank Accession No.AF190208, 1999).

[0188] Of the eight prevalent known motifs identified using SPCM,several, including SP1, function within the core promoter (see, forexample, Parks and Shenk, J. Biol. Chem. 271:4417-4430, 1996; Segal etal., J. Mol. Evol. 49:736-749, 1999). Others such as ETS and CRE arecomponents of enhancers. Thus, the SPCM method provides a means toidentify motifs that can act due to direct contributions to a corepromoter and that can function within an enhancer.

[0189] The present methods allows for separate determinations of theactivity of a motif when integrated in the genome or in the episomalstate. Of 480 integrated motifs that were selected as active byGFP-sensitive cell sorting, 120 exceeded the 4-fold threshold asplasmids in the luciferase assay. Thus, the present method provides ameans to identify regulatory elements that function only when integratedin a genome. The possibility that some of the activities seen in theintegrated state arose because of proximity to unknown enhancers raisesthe issue of false positive responses.

[0190] In comparison to the use of retroviral infection and integration,which requires cell division, transfection and antibiotic resistanceagainst selection by Zeocin® were used to construct stable cell linesthat achieved results similar to those reported using the retroviralvector. However, integration of promoter constructs was less efficientthan when a retroviral vector was used. The use of retroviruses allowsapplication of SPCM to cells in an organism in vivo, thus providing ameans to identify regulatory elements that are active only duringparticular stages of development.

[0191] Variations to the present method include, for example, thescreening of libraries constructed from different lengths of randomersto minimize potential biasing. Moreover, the use of larger cell samplecan improve statistical analysis of the prevalence of particular motifs.In addition, application of the present method to screening in variouscell types and species can elucidate evolutionary changes in regulatoryelements that occur, for example, as a result of speciation events, thusproviding a means to classify an unknown sample. Consistent applicationof the current and related SPCM approaches should allow the creation ofdatabases of truly functional promoters and also include cognateinformation on various species and developmental states.

[0192] Several extensions of the SPCM procedure can be useful. Forexample, in addition to the selection of random DNA sequences of aparticular length, the method can be used to analyze combinations of asingle known motifs such as an Octamer element with random sequences,thus providing a means to identify synergies between various cis actingregulatory elements and the modulation of interactions withcorresponding transcription factors. Moreover, the deliberate assemblageof combinations of known elements in various lengths, orders, polarity,and spacings can provide a means to obtain regulatory elements havingdesirable characteristics. Selected transcriptional regulatory elementsor combinations thereof as disclosed herein, for example, in matrixarrays, can be used to detect differential responses of normal cells andcells from various diseased tissues for diagnostic purposes or drugdevelopment.

Exhibit 3 Modification of the Transcriptional Regulatory ElementSelection System

[0193] This example demonstrate that various vector constructs andreporter molecules can be used for identifying synthetic transcriptionalregulatory elements.

[0194] The retroviral vector, MESVR/EGFP*/IRESNCAMPro(ori) (SEQ ID NO:9), was made essentially by substituting a cDNA sequence encoding the140 kD form of the human neural cell adhesion molecule, N-CAM, for thePac coding sequence in the MESVR/EGFP*/IRESpacPro(ori) vector (SEQ IDNO: 1). The entire N-CAM cDNA was generated by PCR using 5′ and 3′primers having Afl III and Sal I restriction sites, respectively. Theselection system based on N-CAM uses an anti N-CAM antibody, whichimmunoreacts with eukaryotic cells that are expressing N-CAM under thecontrol of an introduced synthetic oligonucleotide havingtranscriptional promoter activity. Selection can be performed, forexample, by fluorescently labeling the anti-N-CAM antibody, contactingthe cell with the antibody, and using a method such as FACS to selectretroviral infected cells expressing the N-CAM marker.

[0195] The disclosed selection method also can be practiced using otherexpression vectors, including variants of the disclosed retroviralvectors. For example, the adenovirus major late promoter can besubstituted with another minimal promoter such as the minimal enkephalingene promoter (MEK). In addition, a nucleotide sequence encoding areporter protein other than EGFP or puromycin can be used. For example,EGFP can be substituted with GFP or another fluorescent reporter, orwith luciferase or other easily detectable reporter. Similarly, thenucleotide sequence encoding puromycin N-acetyltransferase can besubstituted with one encoding hygromycin B phosphotransferase, whichconfers resistance to hygromycin B, the Sh ble gene product, whichconfers resistance to the antibiotic Zeocin® (bleomycin), or neomycin(aminoglycoside) phosphotransferase, which confers resistance to theaminoglycoside antibiotic, G418. Non-retroviral expression vectors alsocan be used, and similarly are designed to contain one or morepolynucleotides encoding selectable markers such that cells containingan integrated form of the vector can be selected.

[0196] Additional exemplary vectors useful in the disclosed methods areprovided. The pnZ-MEK vector (SEQ ID NO: 2; see, also, FIG. 2A) containsa MEK minimal promoter and nucleotide sequences encoding the prokaryoticSh ble gene product and the neomycin (aminoglycoside)phosphotransferase, which confer resistance to antibiotics Zeocin® andG418, respectively. The pnZ-MEK vector also contains unique Pst I andNot I restriction sites, into which an oligonucleotide to be tested fortranscriptional regulatory activity, for example, Ran18 or Ran12cassettes or other putative regulatory elements can be inserted.Elements are cloned upstream of the MEK promoter upstream of the Zeocin(bleomycin) resistance gene.

[0197] The pnL-MEK vector (see FIG. 2B) is similar to pnZ-MEK, except itcontains a luciferase reporter gene substituted for the Sh ble gene, andcan be used to corroborate the activity of regulatory elements that areselected in the procedure. An additional vector, pnH-MEK was constructedby substituting the sequence encoding Sh ble (or luciferase) reportergene of pnZ-MEK (or pnL-MEK) with one encoding hygromycin Bphosphotransferase, which confers resistance to hygromycin B (SEQ ID NO:3; see, also, FIG. 2C). Each of these vectors contain a gene encodingneomycin resistance (aminoglycoside) phosphotransferase, which is drivenby the strong SV40 early promoter. The neomycin resistance gene cassetteallows selection for integration of a construct in the cellular genomeusing G418. In addition, the vectors contain a sequence encodingβ-lactamase (bla), which confers resistance to kanamycin and allows forselection of the vectors in bacterial cells.

[0198] To confirm the utility of the above described expression vectors,a library of random 12mers was screened. Single strandedoligonucleotides containing a core of twelve random bases (Ran12) weresynthesized using an Applied Biosystems DNA synthesizer, and annealed totwo linkers forming a hemiduplex DNA with double stranded termini havingPst I and Not I compatible ends. To prepare the double stranded Ran12oligonucleotides, two additional primers complementary to the Pst I andNot I portions of the single stranded oligonucleotide were synthesizedand annealed to the Ran12 oligonucleotides. The annealing forms ahemiduplex DNA molecule that contains double-stranded ends that arecompatible with Pst I and Not I restriction sites and a single-strandedportion that corresponds to the Ran12.

[0199] Annealing was performed with a 50-fold molar excess of the twoprimers relative to the Ran12 oligonucleotide in a solution containingTris-HCl (pH 7.5), 1 mM MgCl₂ at 75° C. for 10 min, followed by slowcooling to room temperature. The library of Ran12 oligonucleotides wasligated into either the pnL-MEK or pnZ-MEK vectors in a 1:1 ratio ofRan12 oligonucleotide to vector. Generally, 100 to 500 ng of vector wasused in each ligation in a volume of 10 μL. DNA was then purified usingQiaQuick PCR purification columns (Qiagen) and 10% of the ligationmixture was used to transform frozen competent XL10 Gold E. coli(Stratagene). DNA polymerase I in the bacteria fills-in the hemiduplex,thus producing a double stranded Ran12 sequence. Equal portions of thetransformation mix were plated on 150 mm LB plates containing kanamycin.

[0200] Several cell lines can be used for transfection. The P19 cellline is a model system for the study of neuronal and muscle celldifferentiation. In the presence of retinoic acid, the embryonal P19cells differentiate into glial cells and neurons, whereas in thepresence of DMSO, P19 cells differentiate into skeletal and cardiacmuscle cells. Furthermore, these cells differentially express genes thatare important to these induction processes. Regulatory elementsidentified as active in the P19 differentiation system can be tested inother cell lines of known phenotype to further define the role of theelement in a particular step of differentiation. Other cell lines thatcan also be induced to differentiate include NG108-15 and Neuro2A cells.Although the latter cells are not pluripotent as is the P19 cell system,they provide a means to focus on more specific differentiation eventswithin the nervous system.

[0201] The Ran12/pnZ-MEK constructs were introduced into P19 cells byelectroporation using a BIORAD Gene Pulser, which results in theinsertion of the expression constructs into one site within the genome.Electroporation was performed in either growth medium or Opti-MEM using10 μg of linearized DNA in 15×10⁶ cells. After electroporation, stablytransfected cells were selected in 10 cm dishes in the presence of bothG418 (0.2 mg/ml) and Zeocin® (0.1 mg/ml). Cells were selected for 2weeks and colonies that survive were transferred to 96 well plates. Oncestable cell lines were established, cells were induced to differentiatewithin the 96 well plates. In the first set of isolated Ran12 promoterelements, four million synthetic Ran12 elements were screened, and onethousand Zeocin-resistant cell colonies were isolated.

[0202] Cell lines were analyzed to identify the combinations of knownelements or novel regulatory elements that allowed sufficient Zeocin®expression for survival. Cells were cultured in 96 well plates andgenomic DNA was isolated using a Chelex lysis procedure and purifiedusing the QiaAmp Tissue Kit (Qiagen). Regulatory elements were amplifiedby PCR (two rounds of 25 cycles) using primers that flank the regulatoryelement cassette. The amplified regulatory cassette was sequenceddirectly using the automated DNA sequencer. To independently assay theactivity of elements selected in stable lines, each cassette was clonedinto the pnL-MEK vector in order confirm and quantitate the activity ofindividual elements. Each luciferase reporter containing an element wastransiently transfected into cells using Lipofectamine (LifeTechnologies) and luciferase activity was assayed 48 hr later using anenzymatic assay and detected on a luminometer.

[0203] A number of synthetic regulatory constructs that functioned wellin a particular cell type and cell culture environment were identifiedand compared to others selected from cells cultured in a differentenvironment to determine profiles of regulatory elements that functionbest in a particular cell and culture environment. Representative Ran12sequences obtained by this selection procedure are shown in Table 2 (SEQID NOS: 40 to 82). Elements that resembled portions of the binding TABLE2 1. Elements that resemble known 2. Novel elements transcription factorbinding sites Repeated core motifs Homeodomain factor binding sitesGGTGGGTGTGTC (63) GGCATTCATCGT Pit-1a (40) TTACTGGGTGTT (64)GCATTAGTATCT lsl-1 (41) AAGTCTTGGGT (65) GGTTGGGTCCCC (66) CAAT boxTTGGGTCATTGT (67) TTGGGTCGTTGT (68) TCGGTTATTGTT (42) TCTGGGTCGCGC (69)TCCAATTGGGAA (43) TCCTTCTGGGTC (70) ATCTATTGGCCA gamma CAAT (44)CCTTTGTGGGTC (71) CACCC box TCACTTCTGGGC (72) TTACTGGGTGTT (45)CTAGTGGGAGCT (73) AGGGTGAAGGTC (46) TGGGCGAGTGGG (74) GGTGGGTGTGTC (47)c-myb TGCTTCAATGCC (75) CGCTTCAATGCT (48) TGCTTCAATGCC (49) AGGGTGAAGGTC(77) ACCCGGGGAAGG (78) Hormone response elements (HRE) TGTGTCTTTGCA GR(50) TGTGTCTTTGCA (79) CACGGGGACAGC GR (51) CGAACTTTGCAA (80)AAGCTGTACATG GR PR (52) GATGGGGGCACA GR (53) ATATGTGCCCTT GR (54)TCCTTCTGGGTC ER fos/jun (55) GGTGGGTGTGTC GR AP1 (56) Elements foundmore than once Other TTGGGTCGTTGT found 4 times (68) TGAGTAAGCTAT foundtwice (81) GAATGGATGGG AP-2 (57) TATGTAAGAACG found twice (82)CATGTGATATTC USF (58) TCGGTTATTGTT found twice (42) AGGAGGGTTTGT C/EBPalpha (59) TGGGCGAGTGGG Zeste (60) CGGCTCACCAGT Zeste (61) GGTTTCTATAACTBP (62)

[0204] sites for known transcriptional regulatory proteins, includinghomeodomain binding sites, CCAAT boxes, CACCC boxes, binding sites forc-myb, hormone response elements (glucocorticoid receptor, progesteronereceptor and estrogen receptor), binding sites for the products ofimmediate-early genes such as fos and jun and Ap-2, and other factorsincluding C/EBP, USF, Zeste, and TBP are indicated. Elements that wereselected by this procedure, but that do not contain otherwiseidentifiable known binding sites for transcription factors, also areindicated. Remarkably, several different core motifs were identified,including TTGGGT (SEQ ID NO: 83) present in SEQ ID NOS: 63 to 71,CTAGTGGG (SEQ ID NO: 84) present in SEQ ID NOS: 72 to 74, ATGCC (SEQ IDNO: 85) present in SEQ ID NOS: 75 and 76, GAAGG (SEQ ID NO: 86) presentin SEQ ID NOS: 77 and 78, and CTTTTGCA (SEQ ID NO: 87) present in SEQ IDNOS: 79 and 80 (see Table 2). In addition, some Ran12 sequences wereobtained more than once (see Table 1; SEQ ID NOS: 42, 68, 81 and 82).These results confirm the general utility of the disclosed methods foridentifying transcriptional regulatory elements having a variety oflengths.

EXAMPLE 4 Selection of Synthetic Translational Regulatory Elements

[0205] This example describes the preparation of a vector useful forselecting oligonucleotide sequences having internal ribosome entry site(IRES) activity and the identification and characterization of suchselected elements.

[0206] The disclosed synthetic IRES methodology provides a means forselecting functional IRES elements. Similar to the transcriptionalregulatory element selection method disclosed above (Examples 1 to 3),the IRES selection method allows the parallel screening of 1×10⁶ to1×10¹⁰ or more random oligonucleotide elements or combinations ofelements for activity in mammalian cells. Selection of synthetic IRESelements in mammalian cells is facilitated if 1) each cell receives asingle unique cassette to avoid selection of inactive elements that arefortuitously present in the same cell as an active element; 2) thedelivery system is efficient so that a complex library can be readilyscreened; and 3) the selection process is stringent and is based on areporter gene assay that is highly sensitive and faithfully reports theactivity of the IRES elements.

[0207] As disclosed herein, a library of oligonucleotides was ligatedimmediately upstream of the second nucleotide sequence of a dicistronicreporter cassette comprising two reporter genes by insertion into acloning site in the intercistronic spacer sequence. The exemplifiedreporter cassette (see below) contained nucleotide sequences encodingenhanced green fluorescent protein (EGFP) and enhanced cyan fluorescentprotein (ECFP), which were arranged in a dicistronic construct thatallows two separate gene products to be made from a single mRNA that isdriven by a single promoter. After infection of cells with theretroviral IRES element library and integration into the genome, eachIRES was scored for its translational activity by examining the activityof the ECFP reporter gene relative that of EGFP. After 2 to 3 days ofinfection, uninfected cells were selected by FACS to obtain cellsexpressing both EGFP and ECFP; the level of ECFP expression in each cellreflected the strength of an individual synthetic IRES element cassette,such that highly fluorescent cells are likely to contain highly activeIRES elements. After multiple rounds of selection, the IRES sequenceswere amplified from the cellular genome by PCR and sequenced using anautomated DNA sequencer to determine the identity of each of thesynthetic IRES elements. The activity of each selected IRES element wasconfirmed by amplifying the entire IRES element, inserting the amplifiedelement into a dicistronic luciferase reporter vector, and screening forthe second luciferase reporter protein under translational control ofthe inserted IRES. This method allowed the testing of the regulatorycassette in a different reporter system, which was more amenable toquantitation of IRES activity levels.

[0208] The benefits of using a retroviral delivery system foridentifying synthetic IRES elements are similar to those described inExample 1 for identifying transcriptional regulatory elements. Therecombinant retroviral vector designed for the IRES selection procedurewas designated MESVR/EGFP/ECFP/RSVPro (SEQ ID NO: 109; see, also, FIG.3). This vector was based on the MESV/IRESneo (Owens et al., supra,1998; Mooslehner et al., supra, 1990; Rohdewohld et al., supra, 1987),similarly to the MESVR/EGFP*/IRES/pacPro(ori) vector (SEQ ID NO: 1)described in Example 1.

[0209] Features of the MESVR/EGFP/ECFP/RSVPro vector include that 1) amultiple cloning site was introduced into the downstream LTR forinsertion of the exogenous sequences that can regulate transcriptionalactivity of a transgene encoded by the recombinant retrovirus, and theendogenous viral core promoter was replaced with a strong basal promoterto potentiate transcription promoting activity of inserted sequences; 2)a mutated EGFP encoding sequence followed by a multiple cloning site toallow insertion of elements to be tested sequences and a sequenceencoding ECFP to allow assay of translational activity on a single cellbasis was introduced; 3) enhancer elements in the upstream LTR werereplaced with those from RSV to drive higher levels of RNA genomeproduction in the packaging cells; and 4) an SV40 origin of replicationwas inserted in order to increase the copy number of the retroviralplasmids in the packaging cells. The EGFP and ECFP reporter genes areexpressed as a single transcript, in which the mRNAs are linked by anoligonucleotide to be examined for IRES activity. Expression of bothreporter genes is controlled by a strong RSV promoter to ensureefficient transcription of the RNA viral genome and, therefore, a highviral titer. The multiple cloning site between the EGFP and ECFP codingsequences facilitates the insertion of an oligonucleotide to be examinedfor translational activity.

[0210] Except as indicated, methods were performed essentially asdescribed in Examples 1 to 3. A pool of random 18mers, flanked on eitherside by two different invariant sequences each 6 base pairs in length,was prepared and inserted into the Mlu I site in the intercistronicspacer of MESV/EGFP/ECFP/RSVPro (see FIG. 3; cf. FIG. 1). A library ofrecombinant retroviruses was made by transiently transfecting COS1 cellstogether with plasmids encoding the MLV gag/pol genes and the VSV Gglycoprotein gene. The library was introduced into B104 cells, then 48hr later, the cells were subjected to FACS and cells expressing highlevels of EGFP and ECFP were collected. The selected cells werereplated, then sorted again for EGFP and ECFP expression. Genomic DNAwas extracted from the twice-selected cells, and the 18mers wereisolated by PCR using primers complementary to the sequences flankingthe Mlu I cloning site in the vector.

[0211] IRES activity of the PCR amplified sequences was confirmed bycloning the fragments into the intercistronic region of the dicistronicreporter vector, RPh (Chappell et al., Proc. Natl. Acad. Sci. USA97:1536-1541, 2000, which is incorporated herein by reference).Individual plasmid clones were transfected into B104 cells and theluciferase activities of the first cistron (Renilla luciferase) and thesecond cistron (Photinus luciferase) were assayed. For a given plasmidclone containing a particular 18mer sequence, an increase in thetranslation of the second cistron relative to the first cistron andnormalized to the empty vector indicated that the 18mer functioned as anIRES element.

EXAMPLE 5 Modification of the Translational Regulatory Element SelectionMethod

[0212] This example demonstrates that various vectors and reportercassettes can be used to identify synthetic translational regulatoryelements.

[0213] In higher eukaryotes, translation of some mRNAs occurs byinternal initiation. It is not known, however, whether this mechanism isused to initiate the translation of any yeast mRNAs. In this example,naturally occurring nucleotide sequences that function as IRES elementswithin the 5′ leader sequences of Saccharomyces cerevisiae YAP1 and p150mRNAs were identified. When tested in the 5′ UTRs of monocistronicreporter genes, both leader sequences enhanced translation efficiency invegetatively growing yeast cells. Moreover, when tested in theintercistronic region of dicistronic mRNAs, both sequences exhibitedIRES activity that functioned in living yeast cells. The activity of thep150 leader was much greater than that of the YAP1 leader. The secondcistron was not expressed in control dicistronic constructs that lackedthese sequences or that contained the 5′ leader sequence of a control(CLN3) mRNA in the intercistronic region. Further analyses of the p150IRES revealed that it contained several non-overlapping segments thatwere able independently to mediate internal initiation. These resultsdemonstrate that the p150 IRES has a modular structure similar to IRESelements contained within some cellular mRNAs of higher eukaryotes. BothYAP1 and p150 leaders contained several complementary sequence matchesto yeast 18S rRNA.

[0214] The plasmid pMyr (Stratagene) was used as backbone for bothdicistronic and monocistronic constructs. An adaptor containingrestriction sites Hind III, Pst I, Nhe I, Eco RI, Nco I, and Xba I wasintroduced into the pMyr vector immediately downstream of the GAL1promoter, using Hind III and Xba I as cloning sites. The PstI and XbaIsites were used as cloning sites for a fragment from the RPh dicistronicreporter vector (Stoneley et al., Oncogene 16:423428, 1998, which isincorporated herein by reference; Chappell et al., supra, 2000). Theresulting construct, pMyr-RP, encodes a dicistronic mRNA that encodesRenilla (sea pansy) and Photinus (firefly) luciferase proteins as thefirst (upstream) and second (downstream) cistrons, respectively. Thesecloning steps resulted in a 5′ UTR that differs slightly from that inthe RP mRNA described previously (Stoneley et al., supra, 1998; Chappellet al., supra, 2000). The CYC1 terminator sequence contained withinpMyr-1 vector provides signals for termination of transcription andpolyadenylation.

[0215] The p150, YAP1, and CLN3 leader sequences were PCR amplifiedusing yeast genomic DNA as a template. These leader sequences werecloned into the intercistronic region of the pMyr-RP vector using Eco RIand Nco I restriction sites that were introduced at the 5′ and 3′ endsof the leader sequences to generate constructs designated aspMyr-p150./RP, pMyr-YAP1/RP, and pMyr-CLN3/RP. A hairpin structure witha predicted stability of −50 kcal mol⁻¹ (Stoneley et al., supra, 1998)was introduced into the 5′ UTR of the dicistronic constructs to generatepMyr-p150/RPh, pMyr-YAP1/RPh, and pMyr-CLN3/RPh. Deletions and fragmentsof the p150 leader were generated by PCR amplification of the p150sequence, again using Eco RI and Nco I as cloning sites.

[0216] Monocistronic constructs containing the Photinus luciferase genewere generated in the modified pMyr vector. The Photinus luciferase genewas obtained from the pGL3 control vector (Promega) as an Nco I/Xba Ifragment and cloned using these same sites to generate construct pMyr/P.The leader sequences from YAP1, p150, and CLN3 mRNAs, as well as thehairpin structure were cloned into the pMyr/P vector using the samerestriction sites used for the dicistronic constructs. Constructscontaining the chloramphenicol acetyl transferase (CAT) gene were clonedinto the pGAD10 vector (CLONTECH). The pGAD10 vector was digested withHind III and an adaptor containing restriction sites Hind III, Pst I,Nhe I, Eco RI, Nco I, and Xba I was introduced into this site, which isimmediately downstream of the ADH promoter. The CAT gene was obtainedfrom the pCAT3 control vector (Promega) and cloned into the modifiedpGAD10 vector using NcoI and XbaI restriction sites. The p150 leadersequence was introduced into this vector as an EcoRI/NcoI fragment togenerate the construct designated p150/CAT. The hairpin structuredescribed above was introduced 5′ of this leader sequence to generatethe construct designated p150/CATh.

[0217] The yeast strain EGY48 (MATα, his3, trp1, ura3,LexA_(op(X6))-LEU2; CLONTECH) was used throughout the study. Yeaststrains harboring the pMyr based plasmids were grown overnight in 4 mlsynthetic defined medium (SD) with uracil and glucose. The followingmorning, cells were harvested, washed with 4 ml H₂O, and grown for 3 hrin 4 ml SD medium without uracil with the addition of 2% galactose and1% raffinose. Cells harboring the pGAD10-based constructs did notrequire induction and were cultured in 4 ml SD/Ura glucose mediumovernight. Cells were lysed with 1× lysis buffer (diluted freshly from5× stock; Promega) in tubes with glass beads. Tubes were vortexed twicefor 30 sec and recovered in a microfuge spun at top speed for 3 min at4° C. The supernatant was recovered and 20 μl of the lysate was used toassay luciferase activities using the dual reporter assay system(Promega). CAT activity was measured using N-butyl CoA according totechnical bulletin no. 84 (Promega).

[0218] RNA was isolated from 4 ml cell culture samples. Cells werepelleted, washed with water, and resuspended in 400 μl of TES buffer(100 mM Tris-HCl, pH 7.5, 10 mM EDTA, 0.5% SDS). RNA was extracted usingpreheated phenol (65° C.); the mixture was vortexed for 1 min andincubated at 65° C. for one hr. Samples were put on ice for 5 min, thencentrifuged at 15,000 rpm for 5 min and the top aqueous phase wascollected, re-extracted with phenol once and chloroform once. RNA wasprecipitated with isopropanol, the precipitate was washed with 70%ethanol, dried and dissolved in water. RNA samples were separated by gelelectrophoresis using 1% formaldehyde/agarose gels and transferred toNytran SuperCharge nylon membrane (Schleicher & Schuell). The blots wereprobed with full-length fire-fly luciferase RNA antisense probe that waslabeled with ³²P.

[0219] The 164 nucleotide YAP1 leader sequence (SEQ ID NO: 88) wasexamined for translational regulatory activity in the 5′ UTR of afirefly (Photinus) luciferase reporter mRNA (YAP1/P). Cells weretransformed with constructs expressing the parent Photinus (−/P) mRNA,the YAP1/P mRNA, or the 364 nucleotide 5′ leader of the CLN3/P mRNA as aspacer control. Transcription of these monocistronic mRNAs was undercontrol of the GAL1 promoter; mRNA expression was induced withgalactose, cells were lysed after 3 hr, and luciferase activitiesdetermined and normalized to Photinus luciferase mRNA levels.Translation efficiency of the YAP1/P mRNA was approximately 10-foldgreater than that of either the control −/P or CLN3/P mRNAs. This resultindicates that the YAP1 5′ UTR has translational enhancing activity.

[0220] To determine whether the translation mediated by the YAP1transcribed leader sequence has a cap-independent component, it wastested in a dicistronic mRNA for its ability to mediate internalinitiation. The leader sequence of YAP1 mRNA was placed in theintercistronic region of a dual luciferase dicistronic mRNA and examinedfor IRES activity. In these mRNA transcripts, the upstream cistronencodes Renilla (sea pansy) luciferase and the downstream cistronencodes Photinus luciferase. Cells were transformed with constructsencoding the parent RP mRNA, or with constructs containing the YAP1 orCLN3 leaders in the intercistronic region of the RP mRNA. The YAP1leader sequence enhanced the translation of the downstream Photinusluciferase cistron approximately 5-fold relative to that of the RP mRNA.In contrast, the CLN3 leader had almost no effect on the expression ofthe second cistron relative to that of the RP mRNA.

[0221] Hairpin structures were inserted in the discistronic constructsupstream of the Renilla luciferase gene to block scanning and, thereby,reduce the translation of this reporter molecule. The hairpin structuresblocked Renilla luciferase expression by greater than 90%. Nevertheless,the YAP1 leader permitted translation of the Photinus luciferase gene,even when translation of the Renilla luciferase gene was blocked. Thisresult demonstrates that the YAP1 leader did not increase expression ofthe second cistron by reinitiation or leaky scanning.

[0222] To exclude the possibility that enhanced expression of thedownstream cistron was from shorter, monocistronic mRNAs generated bymechanisms such as RNA fragmentation or an unusual splicing event, RNAwas isolated from transformed cells and analyzed by northern blotanalysis using a probe to the downstream Photinus luciferase gene. Theresults demonstrated that the dicistronic mRNAs were intact. Thus,translation of the second cistron was not due to initiation via shortertranscripts. Together, these results demonstrate that the YAP1 5′ UTRcomprises a nucleotide sequence that has IRES activity and that hastranslational enhancing activity.

[0223] The yeast p150 5′ UTR also was examined for translationalregulatory activity. The 5′ leader of the mRNA encoding the p150 proteinwas determined by primer extension analysis to contain 508 nucleotides(SEQ ID NO: 89; see, also, Goyer et al., Mol. Cell. biol. 13:4860-4874,1993, which is incorporated herein by reference). This sequence contains11 open reading frames (ORFs) and does not appear to contain or be partof an intron (Costanzo et al., Nucl. Acids Res. 28:73-76, 2000, which isincorporated herein by reference), consistent with the observation thatonly 4% of yeast genes contain introns, 90% of which encode ribosomalproteins. The presence of the upstream ORFs in the p150 leader might beexpected to inhibit translation by a scanning mechanism.

[0224] The p150 sequence was tested in the 5′ UTR of a monocistronicreporter mRNA. Constructs containing this sequence enhanced thetranslation efficiency of the reporter gene up to 10-fold. However, theanalysis was complicated by the appearance of a second bandapproximately 1 kb, which may be a partial degradation product of theluciferase mRNA; this RNA was too short to encode a functional Photinusluciferase protein. Accordingly, the p150 leader was tested in the 5′UTR of the CAT reporter gene to further evaluate whether it wasfunctioning as a translational enhancer. The results obtained using theCAT reporter construct were similar to those obtained with the Photinusluciferase reporter gene; the p150 leader sequence enhanced thetranslation efficiency of the CAT reporter gene 9-fold.

[0225] To determine whether any translation mediated by the p150 5′leader was cap-independent, a hairpin structure was inserted at the 5′end of this construct. Although the hairpin structure inhibitedtranslation of a control CAT mRNA by greater than 90%, translationmediated by the p150 leader sequence was not inhibited but, instead, wasenhanced by approximately 3-fold. The CAT mRNA levels did not appear tobe affected. These results demonstrate that the translation mediated bythis leader sequence is cap-independent.

[0226] To confirm that translation was cap-independent, the p150 leaderwas tested in the intercistronic region of the dual luciferase RPdicistronic mRNA. In this location, the p150 leader functioned as apotent IRES, enhancing translation of the downstream Photinus luciferasecistron approximately 200-fold relative to that of the RP parent vector.This increase in Photinus luciferase activity in the p150/RP mRNAresulted in Photinus luciferase protein levels that were approximatelytwice those of Renilla protein levels.

[0227] Blocking the translation of the upstream Renilla luciferase genewith a hairpin structure resulted in an even greater enhancement of thePhotinus:Renilla luciferase ratio, indicating that the translationfacilitated by this sequence was not dependent on the translation of theupstream Renilla luciferase cistron. As with the findings with YAP1, theenhanced expression of the downstream cistron was not associated withRNA fragmentation or unusual splicing events.

[0228] The p150 leader sequence was sequentially deleted from the 5′ endand fragmented into shorter segments, including fragments consisting ofnucleotides 100 to 508, 160 to 508, 250 to 508, 375 to 508, 429 to 508,481 to 508, 250 to 390, and 1 to 250 of SEQ ID NO: 89, each of which wastested for IRES activity. Most of the IRES activity was associated withnucleotides 160 to 508. However, all of the fragments examineddemonstrated some level of IRES activity. Furthermore, deletion ofnucleotides 1 to 100 or nucleotides 100 to 160 increased translation byinternal initiation, indicating that this 160 nucleotide region containstranslational inhibitory sequences, which can inhibit IRES activity. Theleader sequence in construct p150(250-508) corresponds to that of ashorter leader sequence that occurs naturally (Goyer et al., supra,1993). This shorter leader sequence has a level of IRES activity that issimilar to that of the entire 508 nucleotide leader.

[0229] It was previously noted that many eukaryotic mRNAs contain shortcomplementary sequence matches to 18S rRNA, raising the possibility thatribosome recruitment at some cellular IRESes might occur by base pairingbetween mRNA and 18S rRNA (Chappell et al., supra, 2000; Mauro andEdelman, Proc. Natl. Acad. Sci., USA 94:422-427, 1997; Tranque et al.,Proc. Natl. Acad. Sci., USA 95:12238-12243, 1998; Hu et al., Proc. Natl.Acad. Sci. USA 96:1339-1344, 1999). Comparison of the YAP1 and p150leader sequences to yeast 18S rRNA identified two and four complementarysequence matches, respectively, which contained stretches of up to 10nucleotides of perfect complementarity (see FIG. 5). In addition, two ofthe matches are part of more extensive complementary matches of up to 25nucleotides with 84% complementarity. The complementary match atnucleotides 130 to 142 of the p150 IRES (SEQ ID NO: 94; see FIG. 5) iscorrelated with a 60 nucleotide segment of the IRES that can inhibitIRES activity. Another complementary match of the p150 IRES atnucleotides 165 to 183 (SEQ ID NO: 96) is correlated with a 90nucleotide segment of the IRES that contributes to internal initiation.Two other complementary matches of the p150 IRES at nucleotides 423 to437 (SEQ ID NO: 98) and nucleotides 437 to 461 (SEQ ID NO: 100) arepartially or fully contained within a 52 nucleotide segment with IRESactivity (see FIG. 5).

[0230] Although it was previously suggested that the yeast translationmachinery may be capable of mediating internal initiation (Iizuka etal., Mol. Cell. Biol. 14:7322-7330, 1994; Paz et al., J. Biol. Chem.274:21741-21745, 1999, each of which is incorporated herein byreference), the present example demonstrates unequivocally that yeastIRES sequences contained within the YAP1 and p150 leader sequences canfunction in vegetatively growing cells. In addition, numerous sequencessharing complementarity with yeast 18S rRNA were identified within bothleader sequences. Many other mRNAs and cellular IRESes contain similarfeatures, and the complementary sequence matches to 18S rRNA canfunction as cis-acting sequences that affect translation (see, forexample, Chappell et al., supra, 2000). In the case of the 9 nucleotideIRES module characterized from the transcribed leader of the mRNA thatencodes the Gtx homeodomain, this segment is 100% complementary to 18SrRNA. Recruitment of ribosomes at this site appeared to involve basepairing to 18S rRNA within 40S ribosomal subunits. These resultsindicate that recruitment of ribosomes at some cellular IRES element,including the yeast YAP1 and p150 IRESes, can occur directly due to basepairing to rRNA, a mechanism consistent with the modular nature of thesecellular IRES elements.

[0231] The leader sequence of the YAP1 mRNA contained an IRES elementthat contributed to the efficient translation of this mRNA. Sequencefeatures of this leader previously have been shown to affect translationand mRNA stability (Vilela et al., Nucl. Acids Res. 26:1150-1159, 1998;Ruiz-Echevarria and Peltz, Cell 101:741-751, 2000, each of which isincorporated herein by reference). One of these features, a shortupstream open reading frame (uORF) did not inhibit translation of themain ORF, even though it was recognized by a large fraction of thescanning ribosome. Inasmuch as uORFs generally inhibit the translationof downstream cistrons, these results indicated that reinitiation andleaky scanning were also involved in the efficient translation of theYAP1 mRNA.

[0232] The p150 IRES element was particularly active. Although most ofthe IRES activity was localized to nucleotides 160 to 508 (SEQ ID NO:89), the IRES boundaries were not distinct. Moreover, severalnon-overlapping segments functioned independently, suggesting that thisIRES has a modular composition. In a previous study of the IREScontained within the mRNA that encodes the Gtx homeodomain protein, theapparent modularity was pursued to identify a 9 nucleotide segment thatfunctioned independently as an IRES module (see Chappell et al., supra,2000).

[0233] The notion that short nucleotide sequences can recruit thetranslation machinery is not consistent with the proposal that higherorder RNA conformations are uniformly important for the activity of somecellular IRESes. Indeed, the results obtained from deletion and fragmentanalyses of IRESes contained within other mammalian and insect cellularmRNAs indicates that many of these IRESes may also be modular (see, forexample, Yang and Sarnow, Nucl. Acids Res. 25:2800-2807, 1997; Sella etal., Mol. Cell Biol. 19:5429-5440, 1999). The modular composition ofcellular IRESes contrasts with those found in viruses. For example, inpicornaviruses, the IRESes comprise several hundred nucleotides andcontain RNA conformations that appear to be highly conserved and thatare important for activity.

[0234] It is not known how widely internal initiation is used by yeastor higher eukaryotic mRNAs. The identification of numerous insect andmammalian IRESes may reflect a more extensive use of this mechanism inhigher eukaryotes, or it may reflect incidental bias that has resultedin the evaluation of many more mRNAs from insects and mammals than fromyeast. Some mammalian IRESes do not function in living yeast. In thecase of poliovirus, the inactivity of its IRES in S. cerevisiae reflectsa specific blockage that occurs via a short inhibitory RNA. Theinactivity of some mammalian IRESes in yeast may also reflect transfactor requirements that are not provided by yeast cells or differencesrelated to the ability of a sequence to bind a component of thetranslation machinery that is not identical to that in yeast. Forexample, p150 is the yeast homologue of mammalian translation initiationfactor eIF4G, but the two are not functionally interchangeable.

[0235] In higher eukaryotes, IRESes are used by some mRNAs during theG2/M phase of the cell cycle and under conditions that reducecap-dependent translation, as seen, for example, during different typesof stress. In yeast, internal initiation may also be used to facilitatethe translation of essential genes under similar conditions, includingthe condition of nutritional deficiency. It may be significant thatIRESes were identified within the YAP1 and p150 leader sequences giventhat overexpression of YAP1 confers general resistance to manycompounds. In addition, expression of p150 when cap-dependenttranslation is reduced may contribute to the translation of other mRNAsunder these conditions.

[0236] The identification of yeast IRESes that function in vegetativelygrowing cells suggests that yeast and higher eukaryotes use similarmechanisms to initiate translation. The analysis of these mechanismsshould be facilitated in yeast, since many strains of yeast exist withmutations in genes involved in translation. The ability to easilymanipulate this organism genetically may also enable the identificationof specific factors involved in internal initiation and should enable usto critically test the hypothesis that base pairing between certain IRESsequences and 18S rRNA is important for recruitment of ribosomes atthese sites. In addition to these scientific interests, theidentification of yeast IRESes that function as translational enhancersin monocistronic mRNAs also provides numerous applications forbioengineering.

EXAMPLE 6 Identification and Characterization of Sythetic IRES Elements

[0237] This example demonstrates that translational regulatory elements,including IRES elements, can be identified by screening libraries ofrandom oligonucleotides.

[0238] To identify other short sequences with properties similar tothose of the 9 nucleotide Gtx IRES module (CCGGCGGGT; SEQ ID NO: 102),B104 cells were infected with two retroviral libraries that containedrandom sequences of 9 or 18 nucleotides in the intercistronic region.Cells expressing both cistrons were sorted and sequences recovered fromselected cells were examined for IRES activity using a dual luciferasedicistronic mRNA. Two novel IRES elements were identified, each of whichcontained a sequence with complementarity to 18S rRNA. When multiplecopies of either element were linked together, IRES activities weredramatically enhanced. Moreover, the synthetic IRESes weredifferentially active in various cell types. The similarity of theseproperties to those of the Gtx IRES module (SEQ ID NO: 102) providesconfirmatory evidence that short nucleotide sequences can function astranslational regulatory elements.

[0239] The MESVR/EGFP/ECFP/RSVPro retroviral vector (SEQ ID NO: 109; seeExample 4) was used to generate two libraries. In the first library, anoligonucleotide containing 18 random nucleotides (N)₁₈ was cloned intothe Mlu I site of the polylinker. The sequence of this oligonucleotideis: acgcgtgatcca(N)₁₈cgagcgacgcgt (SEQ ID NO: 103; see Edelman et al.,supra, 2000). In the second library, an oligonucleotide containing twosegments of 9 random nucleotides (N)₉ was cloned into the Pac I and MluI sites of the polylinker. The sequence of this oligonucleotide wasttaattaagaattcttctgacat(a)₉ttctgacat(a)₉ttctgacat(a)₉(N)₉(a)₉(N′)₉(a)₉-gactcacaaccccagaaacagacatacgcgt(SEQ ID NO: 104), where N and N′ are different random nucleotidesequences. The design of this oligonucleotide was based on anotherpreviously described oligonucleotide (S_(III)/S_(II))₅β (Chappell etal., supra, 2000). This oligonucleotide did not have IRES activity andwas used as a spacer control. The first library consisted of about2.5×10⁵ bacterial clones and the second consisted of about 1.5×10⁵bacterial clones. As such; each library represented only a smallfraction of the potential sequence complexity of the randomoligonucleotides (about 6.9×10¹⁰).

[0240] The retroviral libraries were packaged in COS1 cells.Subconfluent cells were triply-transfected using the FuGENE 6 reagent(Roche Molecular Chemicals; Indianapolis Ind.) with plasmids encoding 1)the retroviral library, 2) MoMuLV gag and pol genes (pCMV-GP_((Sal)))and 3) the VSV G glycoprotein (see Tranque et al., supra, 1998). After48 hr, retroviral particles were recovered from culture supernatant,filtered through a 0.45 μm membrane, and then used to infect B104 ratneural tumor cells (Bottenstein and Sato, Proc. Natl. Acad. Sci. USA76:514-517, 1979). Approximately 2×10⁶ COS1 cells were transfected, andapproximately the same number of B104 cells were subsequently infected.After 72 hr, cells were harvested and sorted by FACS on a FACSVantage SE(Becton Dickinson; San Jose Calif.). EGFP was excited with an argonlaser tuned to 488 nm and fluorescence was recorded through a 530 nmbandpass filter. ECFP was excited with a krypton/argon laser tuned to457 nm, and fluorescence was measured through a 495 nm bandpass filter.As controls for the FACS, B104 cells were infected with the followingreference viruses: the parent vector (MESV/EGFP/ECFP/RSVPro), a virusencoding EGFP, a virus encoding ECFP, and a virus that contains the IRESfrom the encephalomyocarditis virus (EMCV) in the intercistronic regionof the parent vector.

[0241] Cells co-expressing both EGFP and ECFP were isolated and returnedto culture for 14 days. These cells were then resorted, and highco-expressors were isolated and further expanded in culture for 5 to 7days. Genomic DNA was prepared using a QIA amp DNA miniprep kit(Qiagen). Intercistronic sequences were amplified by PCR using flankingprimers, and cloned into the intercistronic region of RPh, which is adicistronic vector that encodes Renilla luciferase protein as the firstcistron and Photinus luciferase protein as the second cistron (Example1). B104 cells were transiently co-transfected with the dual luciferasevector and with a vector expressing β-galactosidase, and luciferase andβ-galactosidase assays were performed (see Example 1). Photinusluciferase activity values were normalized for transfection efficiencyby means of β-galactosidase activity, and were then normalized to theactivity of the RPh parent vector (first library) or of RPh containingthe (S_(III)/S_(II))₅β oligonucleotide as a spacer control (secondlibrary).

[0242] Sequences of the oligonucleotide inserts were determined using anABI system sequencer (PE Biosystems, Foster City, Calif.), and werecompared using the Clustal X multiple sequence alignment program(Thompson et al., Nucl. Acids Res. 25:4876-4882, 1997), and with theBestFit program from the Genetics Computer Group software package(Devereux et al., Nucl. Acids Res. 12:387-395, 1984). Sequence matcheswere evaluated by comparing BestFit quality scores to those obtainedwhen the selected sequences were randomly shuffled 10 times and comparedto 18S rRNA. Secondary structure predictions were made using mfoldversion 3.0 (Zuker et al., in “RNA Biochemistry and Biotechnology” (ed.Clark; Kluwer academic publishers 1999), pages 1143; Mathews et al., J.Mol. Biol. 288:911-940, 1999). Northern blot analysis was performed asdescribed in Example 1 using a riboprobe encompassing the entire codingregion of the Photinus luciferase gene.

[0243] The retroviral library containing the random 18 nucleotideinserts was examined. This library, derived from 2.5×10⁵ retroviralplasmids was used to infect approximately 2×10⁶rat B104 neural tumorcells. After 72 hr, cells that co-expressed both EGFP and ECFP,corresponding to approximately 0.5% of the cells, were isolated by FACS.These cells were cultured for 14 days, sorted again by FACS, and highco-expressors, corresponding to approximately 4% of cells, werecollected and grown. The twice sorted cells were compared to cells thathad been infected with the virus that contained the EMCV IRES betweenthe EGFP and ECFP genes. Both cell populations showed variableexpression suggesting that IRES activity can vary among individualcells, perhaps reflecting cell cycle differences in the population.

[0244] Intercistronic sequences contained within the population of twicesorted cells were isolated by genomic PCR, and cloned into theintercistronic polylinker of the RPh vector (see Example 1). This dualluciferase vector has a stable hairpin-forming sequence in thetranscribed leader region upstream of the Renilla open reading frame.The hairpin structure blocks scanning ribosomes and therefore suppressestranslation of the first cistron. Fifty clones were picked at random andplasmid DNA was prepared, sequenced, and transiently transfected intoB104 cells. Of the 45 clones that were successfully sequenced, 39contained unique 18 nucleotide inserts. The sequences of the other 6clones were each represented more than once, which may reflect therelatively low complexity of selected sequences in these twice sortedcells.

[0245] The sequenced clones were tested in transfected cells and mostactivities were weak or at a background level. However, one sequence,designated intercistronic sequence 1-23 (ICS1-23; SEQ ID NO: 105)demonstrated enhanced Photinus luciferase activity approximately 8-foldgreater than the control constructs. This level of activity was similarto that observed for one copy of the Gtx IRES module (Example 1).

[0246] A sequence comparison between ICS1-23 (SEQ ID NO: 105) and 18SrRNA (SEQ ID NO: 107) revealed a complementary match between the 3′ endof the IRES and 18S rRNA at nucleotides 1311-1324 (FIG. 4). This matchhas a BestFit quality score that is significantly greater than thatobtained with 10 randomized variations of this sequence. To addresswhether the region of complementarity within ICS1-23 was associated withthe IRES activity, the 30 nucleotide ICS1-23 sequence, which includesthe 18 nucleotide random sequence together with 12 nucleotides offlanking sequence, was divided into two segments of 15 nucleotides each(see FIG. 4). The first 15 nucleotide segment lacked any complementarityto 18S rRNA, (ICS1-23a), while the second segment contained thecomplementary match to 18S rRNA (ICS1-23b; CAGCGGAAACGAGCG; SEQ ID NO:106).

[0247] Multiple linked copies of the Gtx IRES module (SEQ ID NO: 102)had been shown to be more active than the corresponding monomer.Accordingly, multimers of each segment of ICS1-23 were synthesized, witheach repeated segment separated by nine adenosine nucleotides(poly(A)₉). Although three linked copies of the ICS1-23a segment (seeFIG. 4) did not enhance Photinus luciferase expression, constructscontaining three and five linked copies of ICS1-23b (SEQ ID NO: 107)enhanced Photinus luciferase activity as compared to ICS1-23. Theseresults indicate that the sequence of ICS1-23 that sharescomplementarity with 18S rRNA (i.e., SEQ ID NO: 106) has IRES activity.Northern blot analysis of RNA from cells expressing the five-linkedcopies of ICS1-23b (SEQ ID NO: 107) revealed a single hybridizing bandcorresponding in size to the full length dicistronic mRNA, thusconfirming that ICS-23b did not enhance Photinus luciferase activity byother mechanisms such as alternative splicing or by functioning as apromoter.

[0248] The second retroviral library, which contained random 9nucleotide segments separated by a poly(A)₉ spacer in the intercistronicregion of the encoded dicistronic mRNA, was examined in order toidentify smaller translational regulatory elements. Incorporation of thespacer sequence was based on the determination that 9 nucleotide GtxIRES module (SEQ ID NO: 102), when present in multiple copies separatedby the poly(A)₉ spacer, exhibited greater IRES activity than a singlecopy of the module.

[0249] Approximately 2×10⁶ B104 cells were transduced with the secondretroviral library, which was derived from 1.5×10⁵ retroviral plasmids.Approximately 0.3% of the cells were selected by FACS, and cultured andsorted a second time. Approximately 3% of the latter cells were highco-expressors. The oligonucleotide inserts were recovered by genomic PCRand shotgun cloned into the intercistronic region of the RPh. Onehundred clones were picked at random and 84 were successfully sequenced,yielding 37 different sequences. Fifteen of the sequences wererepresented two or more times, indicating that the complexity of thesequences represented in these twice sorted cells was somewhat lowerthan that of the first library. When tested by transient transfection inB104 cells, most sequences enhanced Photinus luciferase activity weakly(about 2-fold or less above background), and none were as active asICS1-23 (SEQ ID NO: 105).

[0250] Six of the sequences, which were isolated four or more times fromthe twice sorted cells, were examined further. Each of these sequencescontained two 9 nucleotide segments, which were tested individually asfive linked copies. One of these constructs, containing a 9 nucleotidesegment designated ICS2-17.2 (TCCGGTCGT; SEQ ID NO: 108), showedenhanced Photinus luciferase activity. In contrast to the five linkedcopies of ICS2-17.1, the other 9 nucleotide segment contained withinselected sequence ICS2-17 did not have IRES activity. RNA analysisconfirmed that a single transcript was produced from the construct, andthat the increase in Photinus luciferase activity was derived from anintact dicistronic mRNA. These results indicate that ICS2-17.2 (SEQ IDNO: 108) functions as an IRES.

[0251] Five linked copies of both ICS1-23b (SEQ ID NO: 106) andICS2-17.2 (SEQ ID NO: 108) also were examined using the 5′ UTR of amonocistronic reporter mRNA. In 7 cell lines tested, (ICS1-23b)₅ blockedtranslation by approximately 70% and (ICS2-17.2)₅ slightly enhancedtranslation. In both cases, mRNA levels appeared to be unaffected. Thisresult indicates that ICS1-23b (SEQ ID NO: 106) and ICS2-17.2 (SEQ IDNO: 108) function as IRES elements in the dicistronic mRNAs and not astranscriptional promoters or enhancers. As with ICS1-23b, sequencecomparisons identified a complementary match between ICS2-17.2 and 18SrRNA with a BestFit quality score that is significantly greater thanthat obtained with 10 randomized variations of the this sequence.

[0252] The activity of the selected ICS1-23b (SEQ ID NO: 106) andICS2-17.2 (SEQ ID NO: 108) IRES modules was examined in additional celllines to determine whether they were active in cell types other than theB 104 neuroblastoma cells. A construct of five linked copies of eachmodule was active in each of the cells line tested, including rat gliomaC6 cells, human neuroblastoma SK cells, mouse neuroblastoma N2a cells,mouse N1H-3T3 fibroblasts, human cervical carcinoma HeLa cells, normalrat kidney NRK cells, and mouse muscle myoblast C2C12 cells. Theactivities of these synthetic IRESes varied as much as ten-fold betweencell lines, and also varied with respect to each other. However, thepattern of activity of the —ICS-23b (SEQ ID NO: 106) module in thedifferent cell lines tested was similar to that observed for ten-linkedcopies of the Gtx IRES module (SEQ ID NO: 102).

[0253] These results demonstrate that relatively small discretenucleotide sequences can act as translational regulatory elements,including as IRES elements, which mediate cap-independent translation.Furthermore, the two IRES modules identified in this Example wereselected from only a minute sampling of the total complexity of therandom oligonucleotides. Thus, it is likely that screening a morecomplex library of random oligonucleotide will identify additional shortnucleotide sequences having IRES or other translational regulatoryactivity.

[0254] It is remarkable that each of the short IRES element disclosedherein, including the Gtx IRES (SEQ ID NO: 102), the ICS1-23b IRES (SEQID NO: 106), and the ICS2-17.2 IRES (SEQ ID NO: 108) can promoteinternal initiation. Each of these three IRES modules contain acomplementary match to different segments of 18S rRNA, suggesting that adirect interaction occurs between the IRES module and the 40S ribosomalsubunit via base pairing to 18S rRNA. Alternatively, one or more of theIRES modules may recruits 40S ribosomal subunits by interacting with aprotein component of the translational machinery, for example, aribosomal protein, an initiation factor, or some other bridging protein.The ability to initiate translation internally by binding to aninitiation factor has been reported, wherein an iron response element(IRE) and the bacteriophage λ transcriptional anti-terminator box Belement were both demonstrated to function as IRESes in the presence offusion proteins between the appropriate binding protein for these RNAelements and eIF4G (DeGregorio et al., EMBO J. 18:4865-4874, 1999).However, the lack of appreciable sequence similarities between the IRESmodules disclosed herein and cellular IRESes in general suggests that awide variety of nucleotide sequences can function in internaltranslation initiation, and suggests that different sequences mayrecruit pre-initiation complexes by different mechanisms.

[0255] The observation that synthetic IRESes comprising multimers ofICS1-23b (SEQ ID NO: 106), ICS2-17.2 (SEQ ID NO: 108), or the Gtx (SEQID NO: 102) IRES module show enhanced IRES activity as compared to thecorresponding monomers suggest that multiple copies of the IRES modulemay increase the probability of recruiting 40S ribosomal subunits. Asimilar observation has been made for eIF4G tethered to the IRE-bindingprotein, where there was an approximately linear increase in translationwhen the number of IRE binding sites was increased from one site tothree (DeGregorio et al., supra, 1999).

[0256] An arresting feature of cellular IRESes, as well as of thedisclosed IRES modules, is their variable potency in different celltypes. As such, selection for IRESes in a variety of cell types canprovide a means to identify additional elements having cell-specific andtissue-specific activities. If ribosomal recruitment requires directinteraction of IRESes with 18S rRNA, variations in efficiency mayreflect differences in the accessibility of particular segments of 18SrRNA in different cell types. Alternatively, some IRES modules mayrequire or be blocked by binding proteins that are differentiallyexpressed in various cell types. Such possibilities can be distinguishedby determining which proteins or components of the translation machinerybind to particular IRES sequences in various differentiated cells. Inview of the modular nature of cellular IRES, combinations of syntheticIRESes can be constructed and elements having desirable regulatoryactions can be selected. Such a combinatorial approach can be used toconstruct synthetic IRESes having variable translational regulatoryactivity, for example, highly restricted or widespread translationalactivity.

EXAMPLE 7 Design of IRES Modules Based on rRNA Structure

[0257] This example demonstrates that synthetic oligonucleotides havingIRES activity can be designed based on the structure of ribosomal RNAmolecules.

[0258] As disclosed herein, cellular IRESes exist as modular structurescomposed of short, independent oligonucleotides, includingoligonucleotide that are complementary to 18S rRNA, and synthetic IRESeshave been identified that also are complementary to rRNA oligonucleotidesequences. These results indicate that recruitment of ribosomal subunitsby IRES modules is directed by base pairing of the IRES element to therRNA within the ribosomal subunit.

[0259] The 9 nucleotide Gtx IRES module (SEQ ID NO: 102) is 100%complementary to an oligonucleotide sequence of 18S rRNA, and was testedas an IRES module based on this observation. In addition, the ability ofthe Gtx IRES module (SEQ ID NO: 102) to recruit 40S ribosomal subunitsby base pairing to 18S rRNA was examined. Nitrocellulose filter-bindingand electrophoretic mobility gel shift assays established a physicallink between the 9 nucleotide Gtx IRES module (SEQ ID NO: 102) anddissociated ribosomal subunits, but not with other components of celllysates. Transfection studies using dicistronic constructs thatcontained the Gtx IRES module (SEQ ID NO: 102) or mutants of thissequence demonstrated that internal initiation was maximal with a mutantmodule sharing 7 nucleotides of complementarity with 18S rRNA, and thatas the degree of complementarity was progressively increased ordecreased, IRES activity was decreased and, ultimately, lost. Whentested in the 5′ or 3′ UTR of a monocistronic mRNA, sequences thatenhanced internal initiation also functioned as translational enhancers.However, only those sequences with increased complementarity to 18S rRNAinhibited both internal initiation and translation in monocistronicmRNAs. This inhibition appeared to involve stable interactions betweenthe mRNA and 40S ribosomal subunits as determined by polysome analysis.These results indicate that internal initiation of translation can occurat short nucleotide sequences by base pairing to 18S rRNA.

[0260] Sequence analysis of the IRES-modules recovered from theselection studies showed that most of the selected sequences containedcomplementary sequence matches of 8 to 9 nucleotides to differentregions of the 18S rRNA (FIG. 6). Furthermore, many of the matches areto un-base paired regions of the rRNA (see FIG. 6B). Moreover, in somecases, several selected synthetic IRESes with slightly differentsequences, were complementary to the same region of the 18S rRNA (see,also, Owens et al., 2001, which is incorporated herein by reference).These results indicate that synthetic translational regulatory elementscan be designed based on rRNA sequences such as those set forth in SEQID NOS: 110-112, particularly to un-base paired rRNA sequences, whichcan be predicted using methods as disclosed herein, such that thesynthetic translational regulatory elements are complementary to aselected rRNA target sequence. Methods of predicting secondary structurefor rRNA are known in the art and include, for example, methods usingthe mfold version 3.0 software (Zuker et al., in “RNA Biochemistry andBiotechnology” (ed. Clark; Kluwer academic publishers 1999), pages 1143;Mathews et al., J. Mol. Biol. 288:911-940, 1999).

[0261] Although the invention has been described with reference to theabove examples, it will be understood that modifications and variationsare encompassed within the spirit and scope of the invention.Accordingly, the invention is limited only by the following claims.

1 112 1 6279 DNA Artificial sequence vector 1 gaattctcat gtttgacagcttatcatcga ttagtccaat ttgttaaaga caggatatca 60 gtggtccagg ctcagttttgactcaacaat atcaccagct gaagcctata gagtacgagc 120 catagataga ataaaagattttatttagtc tccagaaaaa ggggggaatg aaagacccca 180 cctgtaggtt tggcaagctagaaatgtagt cttatgcaat acacttgtag tcttgcaaca 240 tggtaacgat gagttagcaacatgccttac aaggagagaa aaagcaccgt gcatgccgat 300 tggtggaagt aaggtggtacgatcgtgcct tattaggaag gcaacagaca ggtctgacat 360 ggattggacg aaccactctagagaaccatc agatgtttcc agggtgcccc aaggacctga 420 aaatgaccct gtgccttatttgaactaacc aatcagttcg cttctcgctt ctgttcgcgc 480 gcttctgctc cccgagctcaataaaagagc ccacaacccc tcactcggcg cgccagtcct 540 ccgattgact gcgtcgcccgggtacccgta ttcccaataa agcctcttgc tgtttgcatc 600 cgaatcgtgg actcgctgatccttgggagg gtctcctcag attgattgac tgcccacctc 660 ggggtctttc atttggaggttccaccgaga tttggagacc ccagcccagg gaccaccgac 720 ccccccgccg ggaggtaagctggccagcgg tcgtttcgtg tctgtctctg tctttgtgcg 780 tgtttgtgcc ggcatctaatgtttgcgcct gcgtctgtac tagttagcta actagctctg 840 tatctggcgg acccgtggtggaactgacga gttctgaaca cccggccgca accctgggag 900 acgtcccagg gactttgggggccgtttttg tggcccgacc tgaggaaggg agtcgatgtg 960 gaatccgacc ccgtcaggatatgtggttct ggtaggagac gagaacctaa aacagttccc 1020 gcctccgtct gaatttttgctttcggtttg gaaccgaagc cgcgcgtctt gtctgctgca 1080 gcatcgttct gtgttgtctctgtctgactg tgtttctgta tttgtctgaa aattagggcc 1140 agactgttac cactcccttaagtttgacct taggtcactg gaaagatgtc gagcggatcg 1200 ctcacaacca gtcggtagatgtcaagaaga gacgttgggt taccttctgc tctgcagaat 1260 ggccaacctt taacgtcggatggccgcgag acggcacctt taaccgagac ctcatcaccc 1320 aggttaagat caaggtctttcacctggccc gcatggacac ccagaccagg tcccctacat 1380 cgtgacctgg gaagccttggcttttgaccc ccctccctgg gtcaagccct ttgtacaccc 1440 taagcctccg cctcctcttcctccatccgc cccgtctctc ccccttgaac ctcctcgttc 1500 gaccccgcct cgatcctccctttatccagc cctcactcct tctctaggcg ccggaattcg 1560 ttcatggtga gcaagggcgaggagctgttc accggggtgg tgcccatcct ggtcgagctg 1620 gacggcgacg taaacggccacaagttcagc gtgtccggcg agggcgaggg cgatgccacc 1680 tacggcaagc tgaccctgaagttcatctgc accaccggca agctgcccgt gccctggccc 1740 accctcgtga ccaccctgacctacggcgtg cagtgcttca gccgctaccc cgaccacatg 1800 aagcagcacg acttcttcaagtccgccatg cccgaaggct acgtccagga gcgcaccatc 1860 ttcttcaagg acgacggcaactacaagacc cgcgccgagg tgaagttcga gggcgacacc 1920 ctggtgaacc gcatcgagctgaagggcatc gacttcaagg aggacggcaa catcctgggg 1980 cacaagctgg agtacaactacaacagccac aacgtctata tcatggccga caagcagaag 2040 aacggcatca aggccaacttcaagacccgc cacaacatcg aggacggcgg cgtgcagctc 2100 gccgaccact accagcagaacacccccatc ggcgacggcc ccgtgctgct gcccgacaac 2160 cactacctga gcacccagtccgccctgagc aaagacccca acgagaagcg cgatcacatg 2220 gtcctgctgg agttcgtgaccgccgccggg atcactctcg gcatggacga gctgtacaag 2280 taaagcggcc gcgactctagagtcgaggat cctctagagg aattcccgcc cctctccctc 2340 ccccccccct aacgttactggccgaagccg cttggaataa ggccggtgtg cgtttgtcta 2400 tatgttattt tccaccatattgccgtcttt tggcaatgtg agggcccgga aacctggccc 2460 tgtcttcttg acgagcattcctaggggtct ttcccctctc gccaaaggaa tgcaaggtct 2520 gttgaatgtc gtgaaggaagcagttcctct ggaagcttct tgaagacaaa caacgtctgt 2580 agcgaccctt tgcaggcagcggaacccccc acctggcgac aggtgcctct gcggccaaaa 2640 gccacgtgta taagatacacctgcaaaggc ggcacaaccc cagtgccacg ttgtgagttg 2700 gatagttgtg gaaagagtcaaatggctctc ctcaagcgta ttcaacaagg ggctgaagga 2760 tgcccagaag gtaccccattgtatgggatc tgatctgggg cctcggtgca catgctttac 2820 atgtgtttag tcgaggttaaaaaaacgtct aggccccccg aaccacgggg acgtggtttt 2880 cctttgaaaa acacgatgataagcttgcca caacccacaa ggagacgacc ttccatgacc 2940 gagtacaagc ccacggtgcgcctcgccacc cgcgacgacg tcccccgggc cgtacgcacc 3000 ctcgccgccg cgttcgccgactaccccgcc acgcgccaca ccgtcgaccc ggaccgccac 3060 atcgagcggg tcaccgagctgcaagaactc ttcctcacgc gcgtcgggct cgacatcggc 3120 aaggtgtggg tcgcggacgacggcgccgcg gtggcggtct ggaccacgcc ggagagcgtc 3180 gaagcggggg cggtgttcgccgagatcggc ccgcgcatgg ccgagttgag cggttcccgg 3240 ctggccgcgc agcaacagatggaaggcctc ctggcgccgc accggcccaa ggagcccgcg 3300 tggttcctgg ccaccgtcggcgtctcgccc gaccaccagg gcaagggtct gggcagcgcc 3360 gtcgtgctcc ccggagtggaggcggccgag cgcgccgggg tgcccgcctt cctggagacc 3420 tccgcgcccc gcaacctccccttctacgag cggctcggct tcaccgtcac cgccgacgtc 3480 gagtgcccga aggaccgcgcgacctggtgc atgacccgca agcccggtgc ctgacgcccg 3540 ccccacgacc cgcagcgcccgaccgaaagg agcgcacgac cccatgtcga cggtatcgat 3600 aaaataaaag attttatttagtctccagaa aaagggggga atgaaagacc ccacctgtag 3660 gtttggcaag ctagacatgcatcgacgcgt gaagatctga aggggggcta taaaagcgat 3720 ggatccgagc tcggccctcattctggagac tctagaggcc ttgaattcgc ggccgcgcca 3780 gtcctccgat tgactgcgtcgcccgggtac cgtgtatcca ataaaccctc ttgcagttgc 3840 atccgacttg tggtctcgctgttccttggg agggtctcct ctgagtgatt gactacccgt 3900 cagcgggggt ctttcatttgggggctcgtc cgggatcggg agacccctgc ccagggacca 3960 ccgacccacc accgggaggtaagctggctg cctcgcgcgt ttcggtgatg acggtgaaaa 4020 cctctgacac atgcagctcccggagacggt cacagcttgt ctgtaagcgg atgccgggag 4080 cagacaagcc cgtcagggcgcgtcagcggg tgttggcggg tgtcggggcg cagccatgac 4140 ccagtcacgt agcgatagcggagtgtatac tggcttaact atgcggcatc agagcagatt 4200 gtactgagag tgcaccatatgcggtgtgaa ataccgcaca gatgcgtaag gagaaaatac 4260 cgcatcaggc gctcttccgcttcctcgctc actgactcgc tgcgctcggt cgttcggctg 4320 cggcgagcgg tatcagctcactcaaaggcg gtaatacggt tatccacaga atcaggggat 4380 aacgcaggaa agaacatgtgagcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc 4440 gcgttgctgg cgtttttccataggctccgc ccccctgacg agcatcacaa aaatcgacgc 4500 tcaagtcaga ggtggcgaaacccgacagga ctataaagat accaggcgtt tccccctgga 4560 agctccctcg tgcgctctcctgttccgacc ctgccgctta ccggatacct gtccgccttt 4620 ctcccttcgg gaagcgtggcgctttctcat agctcacgct gtaggtatct cagttcggtg 4680 taggtcgttc gctccaagctgggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc 4740 gccttatccg gtaactatcgtcttgagtcc aacccggtaa gacacgactt atcgccactg 4800 gcagcagcca ctggtaacaggattagcaga gcgaggtatg taggcggtgc tacagagttc 4860 ttgaagtggt ggcctaactacggctacact agaaggacag tatttggtat ctgcgctctg 4920 ctgaagccag ttaccttcggaaaaagagtt ggtagctctt gatccggcaa acaaaccacc 4980 gctggtagcg gtggtttttttgtttgcaag cagcagatta cgcgcagaaa aaaaggatct 5040 caagaagatc ctttgatcttttctacgggg tctgacgctc agtggaacga aaactcacgt 5100 taagggattt tggtcatgagattatcaaaa aggatcttca cctagatcct tttaaattaa 5160 aaatgaagtt ttaaatcaatctaaagtata tatgagtaaa cttggtctga cagttaccaa 5220 tgcttaatca gtgaggcacctatctcagcg atctgtctat ttcgttcatc catagttgcc 5280 tgactccccg tcgtgtagataactacgata cgggagggct taccatctgg ccccagtgct 5340 gcaatgatac cgcgagacccacgctcaccg gctccagatt tatcagcaat aaaccagcca 5400 gccggaaggg ccgagcgcagaagtggtcct gcaactttat ccgcctccat ccagtctatt 5460 aattgttgcc gggaagctagagtaagtagt tcgccagtta atagtttgcg caacgttgtt 5520 gccattgctg caggcatcgtggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc 5580 ggttcccaac gatcaaggcgagttacatga tcccccatgt tgtgcaaaaa agcggttagc 5640 tccttcggtc ctccgatcgttgtcagaagt aagttggccg cagtgttatc actcatggtt 5700 atggcagcac tgcataattctcttactgtc atgccatccg taagatgctt ttctgtgact 5760 ggtgagtact caaccaagtcattctgagaa tagtgtatgc ggcgaccgag ttgctcttgc 5820 ccggcgtcaa cacgggataataccgcgcca catagcagaa ctttaaaagt gctcatcatt 5880 ggaaaacgtt cttcggggcgaaaactctca aggatcttac cgctgttgag atccagttcg 5940 atgtaaccca ctcgtgcacccaactgatct tcagcatctt ttactttcac cagcgtttct 6000 gggtgagcaa aaacaggaaggcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa 6060 tgttgaatac tcatactcttcctttttcaa tattattgaa gcatttatca gggttattgt 6120 ctcatgagcg gatacatatttgaatgtatt tagaaaaata aacaaatagg ggttccgcgc 6180 acatttcccc gaaaagtgccacctgacgtc taagaaacca ttattatcat gacattaacc 6240 tataaaaata ggcgtatcacgaggcccttt cgtcttcaa 6279 2 3404 DNA Artificial sequence vector 2ctcgagatct gtaatacgac tcactatagg gctgcaggaa acagctatga ccatgatatc 60atagcggccg cagatctggc gattggggcg cgcgcgcctc cttcggtttg gggctaatta 120taaagtggct ccagcagccg ttaagccccg ggacggcgag gcaggcgctc agagccccgc 180agcctggccc gtgaccccgc agagacgctg aggaagcttc catggccaag ttgaccagtg 240ccgttccggt gctcaccgcg cgcgacgtcg ccggagcggt cgagttctgg accgaccggc 300tcgggttctc ccgggacttc gtggaggacg acttcgccgg tgtggtccgg gacgacgtga 360ccctgttcat cagcgcggtc caggaccagg tggtgccgga caacaccctg gcctgggtgt 420gggtgcgcgg cctggacgag ctgtacgccg agtggtcgga ggtcgtgtcc acgaacttcc 480gggacgcctc cgggccggcc atgaccgaga tcggcgagca gccgtggggg cgggagttcg 540ccctgcgcga cccggccggc aactgcgtgc acttcgtggc cgaggagcag gactgacact 600cgacctcgaa acttgtttat tgcagcttat aatggttaca aataaagcaa tagcatcaca 660aatttcacaa ataaagcatt tttttcactg cattctagtt gtggtttgtc caaactcatc 720aatgtatctt atcatgtctg gatccgtcga cgtcaggtgg cacttttcgg ggaaatgtgc 780gcggaacccc tatttgttta tttttctaaa tacattcaaa tatgtatccg ctcatgagac 840aataaccctg ataaatgctt caataatatt gaaaaaggaa gagtcctgag gcggaaagaa 900ccagctgtgg aatgtgtgtc agttagggtg tggaaagtcc ccaggctccc cagcaggcag 960aagtatgcaa agcatgcatc tcaattagtc agcaaccagg tgtggaaagt ccccaggctc 1020cccagcaggc agaagtatgc aaagcatgca tctcaattag tcagcaacca tagtcccgcc 1080cctaactccg cccatcccgc ccctaactcc gcccagttcc gcccattctc cgccccatgg 1140ctgactaatt ttttttattt atgcagaggc cgaggccgcc tcggcctctg agctattcca 1200gaagtagtga ggaggctttt ttggaggcct aggcttttgc aaagatcgat caagagacag 1260gatgaggatc gtttcgcatg attgaacaag atggattgca cgcaggttct ccggccgctt 1320gggtggagag gctattcggc tatgactggg cacaacagac aatcggctgc tctgatgccg 1380ccgtgttccg gctgtcagcg caggggcgcc cggttctttt tgtcaagacc gacctgtccg 1440gtgccctgaa tgaactgcaa gacgaggcag cgcggctatc gtggctggcc acgacgggcg 1500ttccttgcgc agctgtgctc gacgttgtca ctgaagcggg aagggactgg ctgctattgg 1560gcgaagtgcc ggggcaggat ctcctgtcat ctcaccttgc tcctgccgag aaagtatcca 1620tcatggctga tgcaatgcgg cggctgcata cgcttgatcc ggctacctgc ccattcgacc 1680accaagcgaa acatcgcatc gagcgagcac gtactcggat ggaagccggt cttgtcgatc 1740aggatgatct ggacgaagag catcaggggc tcgcgccagc cgaactgttc gccaggctca 1800aggcgagcat gcccgacggc gaggatctcg tcgtgaccca tggcgatgcc tgcttgccga 1860atatcatggt ggaaaatggc cgcttttctg gattcatcga ctgtggccgg ctgggtgtgg 1920cggaccgcta tcaggacata gcgttggcta cccgtgatat tgctgaagag cttggcggcg 1980aatgggctga ccgcttcctc gtgctttacg gtatcgccgc tcccgattcg cagcgcatcg 2040ccttctatcg ccttcttgac gagttcttct gagcgggact ctggggttcg aaatgaccga 2100ccaagcgacg cccaacctgc catcacgaga tttcgattcc accgccgcct tctatgaaag 2160gttgggcttc ggaatcgttt tccgggacgc cggctggatg atcctccagc gcggggatct 2220catgctggag ttcttcgccc accctagggg gaggctaact gaaacacgga aggagacaat 2280accggaagga acccgcgcta tgacggcaat aaaaagacag aataaaacgc acggtgttgg 2340gtcgtttgtt cataaacgcg gggttcggtc ccagggctgg cactctgtcg ataccccacc 2400gagaccccat tggggccaat acgcccgcgt ttcttccttt tccccacccc accccccaag 2460ttcgggtgaa ggcccagggc tcgcagccaa cgtcggggcg gcaggccctg ccatagcctc 2520aggatgctac gttctagacg tcaggttact catatatact ttagattgat ttaaaacttc 2580atttttaatt taaaaggatc taggtgaaga tcctttttga taatctcatg accaaaatcc 2640cttaacgtga gttttcgttc cactgagcgt cagaccccgt agaaaagatc aaaggatctt 2700cttgagatcc tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa ccaccgctac 2760cagcggtggt ttgtttgccg gatcaagagc taccaactct ttttccgaag gtaactggct 2820tcagcagagc gcagatacca aatactgtcc ttctagtgta gccgtagtta ggccaccact 2880tcaagaactc tgtagcaccg cctacatacc tcgctctgct aatcctgtta ccagtggctg 2940ctgccagtgg cgataagtcg tgtcttaccg ggttggactc aagacgatag ttaccggata 3000aggcgcagcg gtcgggctga acggggggtt cgtgcacaca gcccagcttg gagcgaacga 3060cctacaccga actgagatac ctacagcgtg agctatgaga aagcgccacg cttcccgaag 3120ggagaaaggc ggacaggtat ccggtaagcg gcagggtcgg aacaggagag cgcacgaggg 3180agcttccagg gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc cacctctgac 3240ttgagcgtcg atttttgtga tgctcgtcag gggggcggag cctatggaaa aacgccagca 3300acgcggcctt tttacggttc ctggcctttt gctggccttt tgctcacatg ttctttcctg 3360cgttatcccc tgattctgtg gataaccgta ttaccgccat gcat 3404 3 4152 DNAArtificial sequence vector 3 ctcgagatct gtaatacgac tcactatagg gctgcaggaaacagctatga ccatgatatc 60 atagcggccg cagatctggc gattggggcg cgcgcgcctccttcggtttg gggctaatta 120 taaagtggct ccagcagccg ttaagccccg ggacggcgaggcaggcgctc agagccccgc 180 agcctggccc gtgaccccgc agagacgctg aggaagcttatatgaaaaag cctgaactca 240 ccgcgacgtc tgtcgagaag tttctgatcg aaaagttcgacagcgtctcc gacctgatgc 300 agctctcgga gggcgaagaa tctcgtgctt tcagcttcgatgtaggaggg cgtggatatg 360 tcctgcgggt aaatagctgc gccgatggtt tctacaaagatcgttatgtt tatcggcact 420 ttgcatcggc cgcgctcccg attccggaag tgcttgacattggggaattc agcgagagcc 480 tgacctattg catctcccgc cgtgcacagg gtgtcacgttgcaagacctg cctgaaaccg 540 aactgcccgc tgttctgcag ccggtcgcgg aggccatggatgcgatcgct gcggccgatc 600 ttagccagac gagcgggttc ggcccattcg gaccgcaaggaatcggtcaa tacactacat 660 ggcgtgattt catatgcgcg attgctgatc cccatgtgtatcactggcaa actgtgatgg 720 acgacaccgt cagtgcgtcc gtcgcgcagg ctctcgatgagctgatgctt tgggccgagg 780 actgccccga agtccggcac ctcgtgcacg cggatttcggctccaacaat gtcctgacgg 840 acaatggccg cataacagcg gtcattgact ggagcgaggcgatgttcggg gattcccaat 900 acgaggtcgc caacatcttc ttctggaggc cgtggttggcttgtatggag cagcagacgc 960 gctacttcga gcggaggcat ccggagcttg caggatcgccgcggctccgg gcgtatatgc 1020 tccgcattgg tcttgaccaa ctctatcaga gcttggttgacggcaatttc gatgatgcag 1080 cttgggcgca gggtcgatgc gacgcaatcg tccgatccggagccgggact gtcgggcgta 1140 cacaaatcgc ccgcagaagc gcggccgtct ggaccgatggctgtgtagaa gtactcgccg 1200 atagtggaaa ccgacgcccc agcactcgtg gggatcgggagatgggggag gctaactgaa 1260 acacggaagg agacaatacc ggaaggaacc cgcgctatgacggcaataaa aagacagaat 1320 aaaacgcacg ggtgttgggt cgtttgttca taaacgcggggttcggtccc agggctggca 1380 ctctgtcgat accccaccga gaccccattg gggccaatacgcccgcgttt cttccttttc 1440 cccaccccaa cccccaagtt cgggtgaagg cccagggctcgcagccaacg tcggtcgacg 1500 tcaggtggca cttttcgggg aaatgtgcgc ggaacccctatttgtttatt tttctaaata 1560 cattcaaata tgtatccgct catgagacaa taaccctgataaatgcttca ataatattga 1620 aaaaggaaga gtcctgaggc ggaaagaacc agctgtggaatgtgtgtcag ttagggtgtg 1680 gaaagtcccc aggctcccca gcaggcagaa gtatgcaaagcatgcatctc aattagtcag 1740 caaccaggtg tggaaagtcc ccaggctccc cagcaggcagaagtatgcaa agcatgcatc 1800 tcaattagtc agcaaccata gtcccgcccc taactccgcccatcccgccc ctaactccgc 1860 ccagttccgc ccattctccg ccccatggct gactaattttttttatttat gcagaggccg 1920 aggccgcctc ggcctctgag ctattccaga agtagtgaggaggctttttt ggaggcctag 1980 gcttttgcaa agatcgatca agagacagga tgaggatcgtttcgcatgat tgaacaagat 2040 ggattgcacg caggttctcc ggccgcttgg gtggagaggctattcggcta tgactgggca 2100 caacagacaa tcggctgctc tgatgccgcc gtgttccggctgtcagcgca ggggcgcccg 2160 gttctttttg tcaagaccga cctgtccggt gccctgaatgaactgcaaga cgaggcagcg 2220 cggctatcgt ggctggccac gacgggcgtt ccttgcgcagctgtgctcga cgttgtcact 2280 gaagcgggaa gggactggct gctattgggc gaagtgccggggcaggatct cctgtcatct 2340 caccttgctc ctgccgagaa agtatccatc atggctgatgcaatgcggcg gctgcatacg 2400 cttgatccgg ctacctgccc attcgaccac caagcgaaacatcgcatcga gcgagcacgt 2460 actcggatgg aagccggtct tgtcgatcag gatgatctggacgaagagca tcaggggctc 2520 gcgccagccg aactgttcgc caggctcaag gcgagcatgcccgacggcga ggatctcgtc 2580 gtgacccatg gcgatgcctg cttgccgaat atcatggtggaaaatggccg cttttctgga 2640 ttcatcgact gtggccggct gggtgtggcg gaccgctatcaggacatagc gttggctacc 2700 cgtgatattg ctgaagagct tggcggcgaa tgggctgaccgcttcctcgt gctttacggt 2760 atcgccgctc ccgattcgca gcgcatcgcc ttctatcgccttcttgacga gttcttctga 2820 gcgggactct ggggttcgaa atgaccgacc aagcgacgcccaacctgcca tcacgagatt 2880 tcgattccac cgccgccttc tatgaaaggt tgggcttcggaatcgttttc cgggacgccg 2940 gctggatgat cctccagcgc ggggatctca tgctggagttcttcgcccac cctaggggga 3000 ggctaactga aacacggaag gagacaatac cggaaggaacccgcgctatg acggcaataa 3060 aaagacagaa taaaacgcac ggtgttgggt cgtttgttcataaacgcggg gttcggtccc 3120 agggctggca ctctgtcgat accccaccga gaccccattggggccaatac gcccgcgttt 3180 cttccttttc cccaccccac cccccaagtt cgggtgaaggcccagggctc gcagccaacg 3240 tcggggcggc aggccctgcc atagcctcag gatgctacgttctagacgtc aggttactca 3300 tatatacttt agattgattt aaaacttcat ttttaatttaaaaggatcta ggtgaagatc 3360 ctttttgata atctcatgac caaaatccct taacgtgagttttcgttcca ctgagcgtca 3420 gaccccgtag aaaagatcaa aggatcttct tgagatcctttttttctgcg cgtaatctgc 3480 tgcttgcaaa caaaaaaacc accgctacca gcggtggtttgtttgccgga tcaagagcta 3540 ccaactcttt ttccgaaggt aactggcttc agcagagcgcagataccaaa tactgtcctt 3600 ctagtgtagc cgtagttagg ccaccacttc aagaactctgtagcaccgcc tacatacctc 3660 gctctgctaa tcctgttacc agtggctgct gccagtggcgataagtcgtg tcttaccggg 3720 ttggactcaa gacgatagtt accggataag gcgcagcggtcgggctgaac ggggggttcg 3780 tgcacacagc ccagcttgga gcgaacgacc tacaccgaactgagatacct acagcgtgag 3840 ctatgagaaa gcgccacgct tcccgaaggg agaaaggcggacaggtatcc ggtaagcggc 3900 agggtcggaa caggagagcg cacgagggag cttccagggggaaacgcctg gtatctttat 3960 agtcctgtcg ggtttcgcca cctctgactt gagcgtcgatttttgtgatg ctcgtcaggg 4020 gggcggagcc tatggaaaaa cgccagcaac gcggcctttttacggttcct ggccttttgc 4080 tggccttttg ctcacatgtt ctttcctgcg ttatcccctgattctgtgga taaccgtatt 4140 accgccatgc at 4152 4 6 DNA Artificialsequence SP1 site 4 gggcgg 6 5 7 DNA Artificial sequence TRE/AP-1element 5 tgactca 7 6 6 DNA Artificial sequence erythroid cell GATAelement 6 gataga 6 7 11 DNA Artificial sequence myeloid tumor elementNF-kB binding site 7 gggaattccc c 11 8 8 DNA Artificial sequence acyclic AMP response element 8 tgacgtca 8 9 8513 DNA Artificial sequencevector 9 gaattctcat gtttgacagc ttatcatcga ttagtccaat ttgttaaagacaggatatca 60 gtggtccagg ctcagttttg actcaacaat atcaccagct gaagcctatagagtacgagc 120 catagataga ataaaagatt ttatttagtc tccagaaaaa ggggggaatgaaagacccca 180 cctgtaggtt tggcaagcta gaaatgtagt cttatgcaat acacttgtagtcttgcaaca 240 tggtaacgat gagttagcaa catgccttac aaggagagaa aaagcaccgtgcatgccgat 300 tggtggaagt aaggtggtac gatcgtgcct tattaggaag gcaacagacaggtctgacat 360 ggattggacg aaccactcta gagaaccatc agatgtttcc agggtgccccaaggacctga 420 aaatgaccct gtgccttatt tgaactaacc aatcagttcg cttctcgcttctgttcgcgc 480 gcttctgctc cccgagctca ataaaagagc ccacaacccc tcactcggcgcgccagtcct 540 ccgattgact gcgtcgcccg ggtacccgta ttcccaataa agcctcttgctgtttgcatc 600 cgaatcgtgg actcgctgat ccttgggagg gtctcctcag attgattgactgcccacctc 660 ggggtctttc atttggaggt tccaccgaga tttggagacc ccagcccagggaccaccgac 720 ccccccgccg ggaggtaagc tggccagcgg tcgtttcgtg tctgtctctgtctttgtgcg 780 tgtttgtgcc ggcatctaat gtttgcgcct gcgtctgtac tagttagctaactagctctg 840 tatctggcgg acccgtggtg gaactgacga gttctgaaca cccggccgcaaccctgggag 900 acgtcccagg gactttgggg gccgtttttg tggcccgacc tgaggaagggagtcgatgtg 960 gaatccgacc ccgtcaggat atgtggttct ggtaggagac gagaacctaaaacagttccc 1020 gcctccgtct gaatttttgc tttcggtttg gaaccgaagc cgcgcgtcttgtctgctgca 1080 gcatcgttct gtgttgtctc tgtctgactg tgtttctgta tttgtctgaaaattagggcc 1140 agactgttac cactccctta agtttgacct taggtcactg gaaagatgtcgagcggatcg 1200 ctcacaacca gtcggtagat gtcaagaaga gacgttgggt taccttctgctctgcagaat 1260 ggccaacctt taacgtcgga tggccgcgag acggcacctt taaccgagacctcatcaccc 1320 aggttaagat caaggtcttt cacctggccc gcatggacac ccagaccaggtcccctacat 1380 cgtgacctgg gaagccttgg cttttgaccc ccctccctgg gtcaagccctttgtacaccc 1440 taagcctccg cctcctcttc ctccatccgc cccgtctctc ccccttgaacctcctcgttc 1500 gaccccgcct cgatcctccc tttatccagc cctcactcct tctctaggcgccggaattcg 1560 ttcatggtga gcaagggcga ggagctgttc accggggtgg tgcccatcctggtcgagctg 1620 gacggcgacg taaacggcca caagttcagc gtgtccggcg agggcgagggcgatgccacc 1680 tacggcaagc tgaccctgaa gttcatctgc accaccggca agctgcccgtgccctggccc 1740 accctcgtga ccaccctgac ctacggcgtg cagtgcttca gccgctaccccgaccacatg 1800 aagcagcacg acttcttcaa gtccgccatg cccgaaggct acgtccaggagcgcaccatc 1860 ttcttcaagg acgacggcaa ctacaagacc cgcgccgagg tgaagttcgagggcgacacc 1920 ctggtgaacc gcatcgagct gaagggcatc gacttcaagg aggacggcaacatcctgggg 1980 cacaagctgg agtacaacta caacagccac aacgtctata tcatggccgacaagcagaag 2040 aacggcatca aggccaactt caagacccgc cacaacatcg aggacggcggcgtgcagctc 2100 gccgaccact accagcagaa cacccccatc ggcgacggcc ccgtgctgctgcccgacaac 2160 cactacctga gcacccagtc cgccctgagc aaagacccca acgagaagcgcgatcacatg 2220 gtcctgctgg agttcgtgac cgccgccggg atcactctcg gcatggacgagctgtacaag 2280 taaagcggcc gcgactctag agtcgaggat cctctagagg aattcccgcccctctccctc 2340 ccccccccct aacgttactg gccgaagccg cttggaataa ggccggtgtgcgtttgtcta 2400 tatgttattt tccaccatat tgccgtcttt tggcaatgtg agggcccggaaacctggccc 2460 tgtcttcttg acgagcattc ctaggggtct ttcccctctc gccaaaggaatgcaaggtct 2520 gttgaatgtc gtgaaggaag cagttcctct ggaagcttct tgaagacaaacaacgtctgt 2580 agcgaccctt tgcaggcagc ggaacccccc acctggcgac aggtgcctctgcggccaaaa 2640 gccacgtgta taagatacac ctgcaaaggc ggcacaaccc cagtgccacgttgtgagttg 2700 gatagttgtg gaaagagtca aatggctctc ctcaagcgta ttcaacaagggggctgaagg 2760 atgcccagaa ggtaccccat tgtatgggat ctgatctggg gcctcggtgcacatgcttta 2820 catgtgttta gtcgaggtta aaaaaacgtc taggcccccc gaaccacggggacgtggttt 2880 tcctttgaaa aacacgatga taagcttgcc acaaccatgt tgcaaactaaggatctcatc 2940 tggactttgt ttttcctggg aactgcagtt tctctgcagg tggatattgttcccagccag 3000 ggggagatca gcgttggaga gtccaaattc ttcttatgcc aagtggcaggagatgccaaa 3060 gataaagaca tctcctggtt ctcccccaat ggagaaaagc tcaccccaaaccagcagcgg 3120 atctcagtgg tgtggaatga tgattcctcc tccaccctca ccatctataacgccaacatc 3180 gacgacgccg gcatttacaa gtgtgtggtt acaggcgagg atggcagtgagtcagaggcc 3240 accgtcaacg tgaagatctt tcagaagctc atgttcaaga atgcgccaaccccacaggag 3300 ttccgggagg gggaagatgc cgtgattgtg tgtgatgtgg tcagctccctcccaccaacc 3360 atcatctgga aacacaaagg ccgagatgtc atcctgaaaa aagatgtccgattcatagtc 3420 ctgtccaaca actacctgca gatccggggc atcaagaaaa cagatgagggcacttatcgc 3480 tgtgagggca gaatcctggc acggggggag atcaacttca aggacattcaggtcattgtg 3540 aatgtgccac ctaccatcca ggccaggcag aatattgtga atgccaccgccaacctcggc 3600 cagtccgtca ccctggtgtg cgatgccgaa ggcttcccag agcccaccatgagctggaca 3660 aaggatgggg aacagataga gcaagaggaa gacgatgaga agtacatcttcagcgacgat 3720 agttcccagc tgaccatcaa aaaggtggat aagaacgacg aggctgagtacatctgcatt 3780 gctgagaaca aggctggcga gcaggatgcg accatccacc tcaaagtctttgcaaaaccc 3840 aaaatcacat atgtagagaa ccagactgcc atggaattag aggagcaggtcactcttacc 3900 tgtgaagcct ccggagaccc cattccctcc atcacctgga ggacttctacccggaacatc 3960 agcagcgaag aaaagactct ggatgggcac atggtggtgc gtagccatgcccgtgtgtcg 4020 tcgctgaccc tgaagagcat ccagtacact gatgccggag agtacatctgcaccgccagc 4080 aacaccatcg gccaggactc ccagtccatg taccttgaag tgcaatatgccccaaagcta 4140 cagggccctg tggctgtgta cacttgggag gggaaccagg tgaacatcacctgcgaggta 4200 tttgcctatc ccagtgccac gatctcatgg tttcgggatg gccagctgctgccaagctcc 4260 aattacagca atatcaagat ctacaacacc ccctctgcca gctatctggaggtgacccca 4320 gactctgaga atgattttgg gaactacaac tgtactgcag tgaaccgcattgggcaggag 4380 tccttggaat tcatccttgt tcaagcagac accccctctt caccatccatcgaccaggtg 4440 gagccatact ccagcacagc ccaggtgcag tttgatgaac cagaggccacaggtggggtg 4500 cccatcctca aatacaaagc tgagtggaga gcagttggtg aagaagtatggcattccaag 4560 tggtatgatg ccaaggaagc cagcatggag ggcatcgtca ccatcgtgggcctgaagccc 4620 gaaacaacgt acgccgtaag gctggcggcg ctcaatggca aagggctgggtgagatcagc 4680 gcggcctccg agttcaagac gcagccagtc cgggaaccca gtgcacctaagctcgaaggg 4740 cagatgggag aggatggaaa ctctattaaa gtgaacctga tcaagcaggatgacggcggc 4800 tcccccatca gacactatct ggtcaggtac cgagcgctct cctccgagtggaaaccagag 4860 atcaggctcc cgtctggcag tgaccacgtc atgctgaagt ccctggactggaatgctgag 4920 tatgaggtct acgtggtggc tgagaaccag caaggaaaat ccaaggcggctcattttgtg 4980 ttcaggacct cggcccagcc cacagccatc ccagccaacg gcagccccacctcaggcctg 5040 agcaccgggg ccatcgtggg catcctcatc gtcatcttcg tcctgctcctggtggttgtg 5100 gacatcacct gctacttcct gaacaagtgt ggcctgttca tgtgcattgcggtcaacctg 5160 tgtggaaaag ccgggcccgg ggccaagggc aaggacatgg aggagggcaaggccgccttc 5220 tcgaaagatg agtccaagga gcccatcgtg gaggttcgaa cggaggaggagaggacccca 5280 aaccatgatg gagggaaaca cacagagccc aacgagacca cgccactgacggagcccgag 5340 aagggccccg tagaagcaaa gccagagtgc caggagacag aaacgaagccagcgccagcc 5400 gaagtcaaga cggtccccaa tgacgccaca cagacaaagg agaacgagagcaaagcatga 5460 tgggatcgtc gacggtatcg ataaaataaa agattttatt tagtctccagaaaaaggggg 5520 gaatgaaaga ccccacctgt aggtttggca agctagacat gcatcgggatatcctagcta 5580 gcccgctcga gcgaacgcgt gaagatctga aggggggcta taaaagcgatggatccgagc 5640 tcggccctca ttctggagac tctagaggcc ttgaattcgc ggccgcgccagtcctccgat 5700 tgactgcgtc gcccgggtac cgtgtatcca ataaaccctc ttgcagttgcatccgacttg 5760 tggtctcgct gttccttggg agggtctcct ctgagtgatt gactacccgtcagcgggggt 5820 ctttcatttg ggggctcgtc cgggatcggg agacccctgc ccagggaccaccgacccacc 5880 accgggaggt aagctggctg cctcgcgcgt ttcggtgatg acggtgaaaacctctgacac 5940 atgcagctcc cggagacggt cacagcttgt ctgtaagcgg atgccgggagcagacaagcc 6000 cgtcagggcg cgtcagcggg tgttggcggg tgtcggggcg cagccatgacccagtcacgt 6060 agcgatagcg gagtgtatac tggcttaact atgcggcatc agagcagattgtactgagag 6120 tgcaccatat gtccgcccat cccgccccta actccgccca gttccgcccattctccgccc 6180 catggctgac taattttttt tatttatgca gaggccgagg ccgcctcggcctctgagcta 6240 ttccagaagt agtgaggagg cttttttgga ggcctaggct tttgcaacatatgtccgccc 6300 atcccgcccc taactccgcc cagttccgcc cattctccgc cccatggctgactaattttt 6360 tttatttatg cagaggccga ggccgcctcg gcctctgagc tattccagaagtagtgagga 6420 ggcttttttg gaggcctagg cttttgcaac atatgcggtg tgaaataccgcacagatgcg 6480 taaggagaaa ataccgcatc aggcgctctt ccgcttcctc gctcactgactcgctgcgct 6540 cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaatacggttatcca 6600 cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaaaaggccagga 6660 accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccctgacgagcatc 6720 acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataaagataccagg 6780 cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccgcttaccggat 6840 acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctcacgctgtaggt 6900 atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaaccccccgttc 6960 agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccggtaagacacg 7020 acttatcgcc actggcagca gccactggta acaggattag cagagcgaggtatgtaggcg 7080 gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaggacagtatttg 7140 gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagctcttgatccg 7200 gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcagattacgcgca 7260 gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgacgctcagtgga 7320 acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatcttcacctaga 7380 tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgagtaaacttggt 7440 ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgtctatttcgtt 7500 catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggagggcttaccat 7560 ctggccccag tgctgcaatg ataccgcgag acccacgctc accggctccagatttatcag 7620 caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaactttatccgcct 7680 ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgccagttaatagtt 7740 tgcgcaacgt tgttgccatt gctgcaggca tcgtggtgtc acgctcgtcgtttggtatgg 7800 cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatcccccatgttgtgca 7860 aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttggccgcagtgt 7920 tatcactcat ggttatggca gcactgcata attctcttac tgtcatgccatccgtaagat 7980 gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgtatgcggcgac 8040 cgagttgctc ttgcccggcg tcaacacggg ataataccgc gccacatagcagaactttaa 8100 aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatcttaccgctgt 8160 tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagcatcttttactt 8220 tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaaaagggaataa 8280 gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattattgaagcattt 8340 atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaaaataaacaaa 8400 taggggttcc gcgcacattt ccccgaaaag tgccacctga cgtctaagaaaccattatta 8460 tcatgacatt aacctataaa aataggcgta tcacgaggcc ctttcgtcttcaa 8513 10 66 DNA Artificial sequence transcriptional regulatoryelement 10 cgctcgcccc ctcccgatcg cctttggatc acgcgtgatc cagggggaacgaatcaaagc 60 cgagcg 66 11 30 DNA Artificial sequence transcriptionalregulatory element 11 gatccagggc aagaaaagca ccagcgagcg 30 12 66 DNAArtificial sequence transcriptional regulatory element 12 gatccaggagggcaagggga ggggcgagcg acgcgtgatc cacgagcagc tggtgatgga 60 cgagcg 66 1366 DNA Artificial sequence transcriptional regulatory element 13cgctcgccct atgtgcgctc aacctggatc acgcgtcgct cgccacccac tttttgccct 60tggatc 66 14 66 DNA Artificial sequence transcriptional regulatoryelement 14 cgctcgtcct gccgtcgacc tccctggatc acgcgtgatc cacacaggagtagaaaacat 60 cgagcg 66 15 66 DNA Artificial sequence transcriptionalregulatory element 15 cgctcggcac gcattagccc ctggtggatc acgcgtgatccagggcagac gggagagaga 60 cgagcg 66 16 65 DNA Artificial sequencetranscriptional regulatory element 16 cgctcgcttc ccgccccccc ctatggatcacgcgtcgctc gtccttctgc gtaacctttt 60 ggatc 65 17 66 DNA Artificialsequence transcriptional regulatory element 17 cgctcgaacc ctccctgttctttttggatc acgcgtcgct cgccccctcc tccctctcgc 60 tggatc 66 18 18 DNAArtificial sequence flanking sequence 18 ctactcacgc gtgatcca 18 19 18DNA Artificial sequence flanking sequence 19 cggcgaacgc gtgcaatg 18 2066 DNA Artificial sequence transcriptional regulatory element 20cgctcgcctg tccgccgcac ttgttggatc acgcgtgatc caccaggaag tgacgtatca 60cgagcg 66 21 66 DNA Artificial sequence transcriptional regulatoryelement 21 cgctcgcaac tctttccccc cccctggacc acgcgtgatc caccaggaagtgacgtatca 60 cgagcg 66 22 138 DNA Artificial sequence transcriptionalregulatory element 22 gatccaggga ggggtagggt ctatcgagcg acgcgtcgctcgtctcctct acacccgctg 60 tggatcacgc gtcgctcgtt gccctcccct tcctcatggatcacgcgtcg ctcgctgtcc 120 ccgccccact cctggatc 138 23 105 DNA Artificialsequence transcriptional regulatory element 23 gatccaagag cgggcagggattggcgagcg acgcgtcgtc gctcgtcccg ccccctctat 60 gcttggatca cgcgtcgctcgtcctcttct ttccttccct ggatc 105 24 30 DNA Artificial sequencetranscriptional regulatory element 24 cgctcggccc cgccctcttc cccctggatc30 25 65 DNA Artificial sequence transcriptional regulatory element 25cgctcgctct tgtgtacctc tccttggatc acgcgtcgct cgccatcttc tgtcgctgct 60ggatc 65 26 30 DNA Artificial sequence transcriptional regulatoryelement 26 cgctcgtctc ttctcgcccc cccctggatc 30 27 66 DNA Artificialsequence transcriptional regulatory element 27 cgctcgcccc tcccctaagcgcgttggatc acgcgtgatc caacgggcaa tgaaacgaat 60 cgagcg 66 28 66 DNAArtificial sequence transcriptional regulatory element 28 cgctcgctggccccgccctt agtttggatc acgcgtcgct cgaccccgcc tttcgtatct 60 tggatc 66 2966 DNA Artificial sequence transcriptional regulatory element 29cgctcgtcgc ctgggttctg ctactggatc acgcgtgatc cagaagagcg gaaggaggga 60cgagcg 66 30 30 DNA Artificial sequence transcriptional regulatoryelement 30 cgctcgcctt cccttacttc acgctggatc 30 31 65 DNA Artificialsequence transcriptional regulatory element 31 cgctcgcctc acgcgaattccccctggatc acgcgtgatc cagagaaggg agggggggac 60 gagcg 65 32 30 DNAArtificial sequence transcriptional regulatory element 32 gatccaggggcaaaaaggga ggggcgagcg 30 33 30 DNA Artificial sequence transcriptionalregulatory element 33 gatccaggtg gggctagtga cgtgcgagcg 30 34 66 DNAArtificial sequence transcriptional regulatory element 34 gatccagatagacgggagtg aaaacgagcg acgcgtgatc caagcggagg agggatgtga 60 cgagcg 66 3566 DNA Artificial sequence transcriptional regulatory element 35gatccaatca aggaggaggg atagcgagcg acgcgtcgct cgtttccggt cttatgtttg 60tggatc 66 36 66 DNA Artificial sequence transcriptional regulatoryelement 36 cgctcgcccc ccgccctctt tgcctggatc acgcgtgatc caggtggggctagtgacgtg 60 cgacgc 66 37 30 DNA Artificial sequence transcriptionalregulatory element 37 gatccagaaa agtgagggga ggggcgagcg 30 38 66 DNAArtificial sequence transcriptional regulatory element 38 gatccagggacagtgagggg gggacgagcg acgcgttgct cgtccatttc acgcccccgc 60 tggatc 66 3930 DNA Artificial sequence transcriptional regulatory element 39gatccaactg gagagtaacg ccctcgagcg 30 40 12 DNA Artificial sequencetranscriptional regulatory element 40 ggcattcatc gt 12 41 12 DNAArtificial sequence transcriptional regulatory element 41 gcattagtat ct12 42 12 DNA Artificial sequence transcriptional regulatory element 42tcggttattg tt 12 43 12 DNA Artificial sequence transcriptionalregulatory element 43 tccaattggg aa 12 44 12 DNA Artificial sequencetranscriptional regulatory element 44 atctattggc ca 12 45 12 DNAArtificial sequence transcriptional regulatory element 45 ttactgggtg tt12 46 12 DNA Artificial sequence transcriptional regulatory element 46agggtgaagg tc 12 47 12 DNA Artificial sequence transcriptionalregulatory element 47 ggtgggtgtg tc 12 48 12 DNA Artificial sequencetranscriptional regulatory element 48 cgcttcaatg ct 12 49 12 DNAArtificial sequence transcriptional regulatory element 49 tgcttcaatg cc12 50 12 DNA Artificial sequence transcriptional regulatory element 50tgtgtctttg ca 12 51 12 DNA Artificial sequence transcriptionalregulatory element 51 cacggggaca gc 12 52 12 DNA Artificial sequencetranscriptional regulatory element 52 aagctgtaca tg 12 53 12 DNAArtificial sequence transcriptional regulatory element 53 gatgggggca ca12 54 12 DNA Artificial sequence transcriptional regulatory element 54atatgtgccc tt 12 55 12 DNA Artificial sequence transcriptionalregulatory element 55 tccttctggg tc 12 56 12 DNA Artificial sequencetranscriptional regulatory element 56 ggtgggtgtg tc 12 57 12 DNAArtificial sequence transcriptional regulatory element 57 gaatggatgg gg12 58 12 DNA Artificial sequence transcriptional regulatory element 58catgtgatat tc 12 59 12 DNA Artificial sequence transcriptionalregulatory element 59 aggagggttt gt 12 60 12 DNA Artificial sequencetranscriptional regulatory element 60 tgggcgagtg gg 12 61 12 DNAArtificial sequence transcriptional regulatory element 61 cggctcacca gt12 62 12 DNA Artificial sequence transcriptional regulatory element 62ggtttctata ac 12 63 12 DNA Artificial sequence transcriptionalregulatory element 63 ggtgggtgtg tc 12 64 12 DNA Artificial sequencetranscriptional regulatory element 64 ttactgggtg tt 12 65 12 DNAArtificial sequence transcriptional regulatory element 65 aagtctttgg gt12 66 12 DNA Artificial sequence transcriptional regulatory element 66ggttgggtcc cc 12 67 12 DNA Artificial sequence transcriptionalregulatory element 67 ttgggtcatt gt 12 68 12 DNA Artificial sequencetranscriptional regulatory element 68 ttgggtcgtt gt 12 69 12 DNAArtificial sequence transcriptional regulatory element 69 tctgggtcgc gc12 70 12 DNA Artificial sequence transcriptional regulatory element 70tccttctggg tc 12 71 12 DNA Artificial sequence transcriptionalregulatory element 71 cctttgtggg tc 12 72 12 DNA Artificial sequencetranscriptional regulatory element 72 tcacttctgg gc 12 73 12 DNAArtificial sequence transcriptional regulatory element 73 ctagtgggag ct12 74 12 DNA Artificial sequence transcriptional regulatory element 74tgggcgagtg gg 12 75 12 DNA Artificial sequence transcriptionalregulatory element 75 tgcttcaatg cc 12 76 12 DNA Artificial sequencetranscriptional regulatory element 76 cgcctcgatg cc 12 77 12 DNAArtificial sequence transcriptional regulatory element 77 agggtgaagg tc12 78 12 DNA Artificial sequence transcriptional regulatory element 78acccggggaa gg 12 79 12 DNA Artificial sequence transcriptionalregulatory element 79 tgtgtctttg ca 12 80 12 DNA Artificial sequencetranscriptional regulatory element 80 cgaactttgc aa 12 81 12 DNAArtificial sequence transcriptional regulatory element 81 tgagtaagct at12 82 12 DNA Artificial sequence transcriptional regulatory element 82tatgtaagaa cg 12 83 6 DNA Artificial sequence core motif 83 ttgggt 6 848 DNA Artificial sequence core motif 84 ctagtggg 8 85 5 DNA Artificialsequence core motif 85 atgcc 5 86 5 DNA Artificial sequence core motif86 gaagg 5 87 8 DNA Artificial sequence core motif 87 cttttgca 8 88 163DNA Saccharomyces cerevisiae 88 accgattaag cacagtacct ttacgttatatataggattg gtgtttagct ttttttcctg 60 agcccctggt tgacttgtgc atgaacacgagccattttta gtttgtttaa gggaagtttt 120 ttgccaccca aaacgtttaa agaaggaaaagttgtttctt aaa 163 89 511 DNA Saccharomyces cerevisiae 89 aatcatttttttgaaaatta cattaataag gcttttttca atatctctgg aacaacagtt 60 tgtttctacttactaatagc tttaaggacc ctcttggaca tcatgatggc agacttccat 120 cgtagtgggatgatcatatg atgggcgcta tcctcatcgc gactcgataa cgacgtgaga 180 aacgatttttttttttcttt ttcaccgtat ttttgtgcgt cctttttcaa ttatagcttt 240 tttttattttttttttttct cgtactgttt cactgacaaa agtttttttt caagaaaaat 300 tttcgatgccgcgttctctg tgtgcaacgg atggatggta gatggaattt caatatgttg 360 cttgaaattttaccaatctt gatattgtga taatttactt aattatgatt cttcctcttc 420 ccttcaatttcttaaagctt cttactttac tccttcttgc tcataaataa gcaaggtaag 480 aggacaactgtaattaccta ttacaataat g 511 90 10 RNA Saccharomyces cerevisiae 90acgagccauu 10 91 10 RNA Saccharomyces cerevisiae 91 aauggcucau 10 92 16RNA Saccharomyces cerevisiae 92 gaaauuugca aaaccc 16 93 16 RNASaccharomyces cerevisiae 93 cuuagaacgu ucuggg 16 94 13 RNA Saccharomycescerevisiae 94 cagacuucca ucg 13 95 13 RNA Saccharomyces cerevisiae 95cgauggaagu uug 13 96 19 RNA Saccharomyces cerevisiae 96 gcgcuauccucaucgcgac 19 97 19 RNA Saccharomyces cerevisiae 97 gucgugcugg ggauagagc19 98 15 RNA Saccharomyces cerevisiae 98 uuaugauucu uccuc 15 99 15 RNASaccharomyces cerevisiae 99 gcggaaggau cauua 15 100 25 RNA Saccharomycescerevisiae 100 cuucccuuca auuucuuaaa gcuuc 25 101 25 RNA Saccharomycescerevisiae 101 gaaacuuaaa ggaauugacg gaagg 25 102 9 DNA Mus musculus 102ccggcgggt 9 103 42 DNA Artificial sequence oligonucleotide containing 18random nucleotide 103 acgcgtgatc cannnnnnnn nnnnnnnnnn cgagcgacgc gt 42104 135 DNA Artificial sequence oligonucleotide containing two segmentsof 9 random nucleotides 104 ttaattaaga attcttctga cataaaaaaa aattctgacataaaaaaaaa ttctgacata 60 aaaaaaaann nnnnnnnaaa aaaaaannnn nnnnnaaaaaaaaagactca caaccccaga 120 aacagacata cgcgt 135 105 30 RNA Artificialsequence ICS 1-23 a-b 105 gauccagagc aggaacagcg gaaacgagcg 30 106 15 RNAArtificial sequence ICS 1-23 b 106 cagcggaaac gagcg 15 107 14 RNASaccharomyces cerevisiae 107 uucucgauuc cgug 14 108 9 DNA Artificialsequence 9 nt segment designated as ICS2-17.2 108 tccggtcgt 9 109 6250DNA Artificial sequence vector 109 gaattctcat gtttgacagc ttatcatcgattagtccaat ttgttaaaga caggatatca 60 gtggtccagg ctcagttttg actcaacaatatcaccagct gaagcctata gagtacgagc 120 catagataga ataaaagatt ttatttagtctccagaaaaa ggggggaatg aaagacccca 180 cctgtaggtt tggcaagcta gaaatgtagtcttatgcaat acacttgtag tcttgcaaca 240 tggtaacgat gagttagcaa catgccttacaaggagagaa aaagcaccgt gcatgccgat 300 tggtggaagt aaggtggtac gatcgtgccttattaggaag gcaacagaca ggtctgacat 360 ggattggacg aaccactcta gagaaccatcagatgtttcc agggtgcccc aaggacctga 420 aaatgaccct gtgccttatt tgaactaaccaatcagttcg cttctcgctt ctgttcgcgc 480 gcttctgctc cccgagctca ataaaagagcccacaacccc tcactcggcg cgccagtcct 540 ccgattgact gcgtcgcccg ggtacccgtattcccaataa agcctcttgc tgtttgcatc 600 cgaatcgtgg actcgctgat ccttgggagggtctcctcag attgattgac tgcccacctc 660 ggggtctttc atttggaggt tccaccgagatttggagacc ccagcccagg gaccaccgac 720 ccccccgccg ggaggtaagc tggccagcggtcgtttcgtg tctgtctctg tctttgtgcg 780 tgtttgtgcc ggcatctaat gtttgcgcctgcgtctgtac tagttagcta actagctctg 840 tatctggcgg acccgtggtg gaactgacgagttctgaaca cccggccgca accctgggag 900 acgtcccagg gactttgggg gccgtttttgtggcccgacc tgaggaaggg agtcgatgtg 960 gaatccgacc ccgtcaggat atgtggttctggtaggagac gagaacctaa aacagttccc 1020 gcctccgtct gaatttttgc tttcggtttggaaccgaagc cgcgcgtctt gtctgctgca 1080 gcatcgttct gtgttgtctc tgtctgactgtgtttctgta tttgtctgaa aattagggcc 1140 agactgttac cactccctta agtttgaccttaggtcactg gaaagatgtc gagcggatcg 1200 ctcacaacca gtcggtagat gtcaagaagagacgttgggt taccttctgc tctgcagaat 1260 ggccaacctt taacgtcgga tggccgcgagacggcacctt taaccgagac ctcatcaccc 1320 aggttaagat caaggtcttt cacctggcccgcatggacac ccagaccagg tcccctacat 1380 cgtgacctgg gaagccttgg cttttgacccccctccctgg gtcaagccct ttgtacaccc 1440 taagcctccg cctcctcttc ctccatccgccccgtctctc ccccttgaac ctcctcgttc 1500 gaccccgcct cgatcctccc tttatccagccctcactcct tctctaggcg ccggaattcg 1560 ttcatggtga gcaagggcga ggagctgttcaccggggtgg tgcccatcct ggtcgagctg 1620 gacggcgacg taaacggcca caagttcagcgtgtccggcg agggcgaggg cgatgccacc 1680 tacggcaagc tgaccctgaa gttcatctgcaccaccggca agctgcccgt gccctggccc 1740 accctcgtga ccaccctgac ctacggcgtgcagtgcttca gccgctaccc cgaccacatg 1800 aagcagcacg acttcttcaa gtccgccatgcccgaaggct acgtccagga gcgcaccatc 1860 ttcttcaagg acgacggcaa ctacaagacccgcgccgagg tgaagttcga gggcgacacc 1920 ctggtgaacc gcatcgagct gaagggcatcgacttcaagg aggacggcaa catcctgggg 1980 cacaagctgg agtacaacta caacagccacaacgtctata tcatggccga caagcagaag 2040 aacggcatca aggccaactt caagacccgccacaacatcg aggacggcgg cgtgcagctc 2100 gccgaccact accagcagaa cacccccatcggcgacggcc ccgtgctgct gcccgacaac 2160 cactacctga gcacccagtc cgccctgagcaaagacccca acgagaagcg cgatcacatg 2220 gtcctgctgg agttcgtgac cgccgccgggatcactctcg gcatggacga gctgtacaag 2280 taaagcggcc gcgactctag agtcgaggatccgctagcta gttaattaat cgcgacgacg 2340 cgtcgccatg gtgagcaagg gcgaggagctgttcaccggg gtggtgccca tcctggtcga 2400 gctggacggc gacgtaaacg gccacaagttcagcgtgtcc ggcgagggcg agggcgatgc 2460 cacctacggc aagctgaccc tgaagttcatctgcaccacc ggcaagctgc ccgtgccctg 2520 gcccaccctc gtgaccaccc tgacctggggcgtgcagtgc ttcagccgct accccgacca 2580 catgaagcag cacgacttct tcaagtccgccatgcccgaa ggctacgtcc aggagcgcac 2640 catcttcttc aaggacgacg gcaactacaagacccgcgcc gaggtgaagt tcgagggcga 2700 caccctggtg aaccgcatcg agctgaagggcatcgacttc aaggaggacg gcaacatcct 2760 ggggcacaag ctggagtaca actacatcagccacaacgtc tatatcaccg ccgacaagca 2820 gaagaacggc atcaaggcca acttcaagatccgccacaac atcgaggacg gcagcgtgca 2880 gctcgccgac cactaccagc agaacacccccatcggcgac ggccccgtgc tgctgcccga 2940 caaccactac ctgagcaccc agtccgccctgagcaaagac cccaacgaga agcgcgatca 3000 catggtcctg ctggagttcg tgaccgccgccgggatcact ctcggcatgg acgagctgta 3060 caagtaagtc gacggtatcg ataaaataaaagattttatt tagtctccag aaaaaggggg 3120 gaatgaaaga ccccacctgt aggtttggcaagctagaatg cataaatgta gtcttatgca 3180 atacacttgt agtcttgcaa catggtaacgatgagttagc aacatgcctt acaaggagag 3240 aaaaagcacc gtgcatgccg attggtggaagtaaggtggt acgatcgtgc cttattagga 3300 aggcaacaga caggtctgac atggattggacgaaccacta gatctgaagg ggggctataa 3360 aagcgatgga tccgagctcg gccctcattctggagactct agaggccttg aattcgcggc 3420 cgcgccagtc ctccgattga ctgcgtcgcccgggtaccgt gtatccaata aaccctcttg 3480 cagttgcatc cgacttgtgg tctcgctgttccttgggagg gtctcctctg agtgattgac 3540 tacccgtcag cgggggtctt tcatttgggggctcgtccgg gatcgggaga cccctgccca 3600 gggaccaccg acccaccacc gggaggtaagctggctgcct cgcgcgtttc ggtgatgacg 3660 gtgaaaacct ctgacacatg cagctcccggagacggtcac agcttgtctg taagcggatg 3720 ccgggagcag acaagcccgt cagggcgcgtcagcgggtgt tggcgggtgt cggggcgcag 3780 ccatgaccca gtcacgtagc gatagcggagtgtatactgg cttaactatg cggcatcaga 3840 gcagattgta ctgagagtgc accatatgtccgcccatccc gcccctaact ccgcccagtt 3900 ccgcccattc tccgccccat ggctgactaattttttttat ttatgcagag gccgaggccg 3960 cctcggcctc tgagctattc cagaagtagtgaggaggctt ttttggaggc ctaggctttt 4020 gcaacatatg tccgcccatc ccgcccctaactccgcccag ttccgcccat tctccgcccc 4080 atggctgact aatttttttt atttatgcagaggccgaggc cgcctcggcc tctgagctat 4140 tccagaagta gtgaggaggc ttttttggaggcctaggctt ttgcaacata tgcggtgtga 4200 aataccgcac agatgcgtaa ggagaaaataccgcatcagg cgctcttccg cttcctcgct 4260 cactgactcg ctgcgctcgg tcgttcggctgcggcgagcg gtatcagctc actcaaaggc 4320 ggtaatacgg ttatccacag aatcaggggataacgcagga aagaacatgt gagcaaaagg 4380 ccagcaaaag gccaggaacc gtaaaaaggccgcgttgctg gcgtttttcc ataggctccg 4440 cccccctgac gagcatcaca aaaatcgacgctcaagtcag aggtggcgaa acccgacagg 4500 actataaaga taccaggcgt ttccccctggaagctccctc gtgcgctctc ctgttccgac 4560 cctgccgctt accggatacc tgtccgcctttctcccttcg ggaagcgtgg cgctttctca 4620 tagctcacgc tgtaggtatc tcagttcggtgtaggtcgtt cgctccaagc tgggctgtgt 4680 gcacgaaccc cccgttcagc ccgaccgctgcgccttatcc ggtaactatc gtcttgagtc 4740 caacccggta agacacgact tatcgccactggcagcagcc actggtaaca ggattagcag 4800 agcgaggtat gtaggcggtg ctacagagttcttgaagtgg tggcctaact acggctacac 4860 tagaaggaca gtatttggta tctgcgctctgctgaagcca gttaccttcg gaaaaagagt 4920 tggtagctct tgatccggca aacaaaccaccgctggtagc ggtggttttt ttgtttgcaa 4980 gcagcagatt acgcgcagaa aaaaaggatctcaagaagat cctttgatct tttctacggg 5040 gtctgacgct cagtggaacg aaaactcacgttaagggatt ttggtcatga gattatcaaa 5100 aaggatcttc acctagatcc ttttaaattaaaaatgaagt tttaaatcaa tctaaagtat 5160 atatgagtaa acttggtctg acagttaccaatgcttaatc agtgaggcac ctatctcagc 5220 gatctgtcta tttcgttcat ccatagttgcctgactcccc gtcgtgtaga taactacgat 5280 acgggagggc ttaccatctg gccccagtgctgcaatgata ccgcgagacc cacgctcacc 5340 ggctccagat ttatcagcaa taaaccagccagccggaagg gccgagcgca gaagtggtcc 5400 tgcaacttta tccgcctcca tccagtctattaattgttgc cgggaagcta gagtaagtag 5460 ttcgccagtt aatagtttgc gcaacgttgttgccattgct gcaggcatcg tggtgtcacg 5520 ctcgtcgttt ggtatggctt cattcagctccggttcccaa cgatcaaggc gagttacatg 5580 atcccccatg ttgtgcaaaa aagcggttagctccttcggt cctccgatcg ttgtcagaag 5640 taagttggcc gcagtgttat cactcatggttatggcagca ctgcataatt ctcttactgt 5700 catgccatcc gtaagatgct tttctgtgactggtgagtac tcaaccaagt cattctgaga 5760 atagtgtatg cggcgaccga gttgctcttgcccggcgtca acacgggata ataccgcgcc 5820 acatagcaga actttaaaag tgctcatcattggaaaacgt tcttcggggc gaaaactctc 5880 aaggatctta ccgctgttga gatccagttcgatgtaaccc actcgtgcac ccaactgatc 5940 ttcagcatct tttactttca ccagcgtttctgggtgagca aaaacaggaa ggcaaaatgc 6000 cgcaaaaaag ggaataaggg cgacacggaaatgttgaata ctcatactct tcctttttca 6060 atattattga agcatttatc agggttattgtctcatgagc ggatacatat ttgaatgtat 6120 ttagaaaaat aaacaaatag gggttccgcgcacatttccc cgaaaagtgc cacctgacgt 6180 ctaagaaacc attattatca tgacattaacctataaaaat aggcgtatca cgaggccctt 6240 tcgtcttcaa 6250 110 1798 DNASaccharomyces cerevisiae 110 tatctggttg atcctgccag tagtcatatg cttgtctcaaagattaagcc atgcatgtct 60 aagtataagc aatttataca gtgaaactgc gaatggctcattaaatcagt tatcgtttat 120 ttgatagttc ctttactaca tggtataacc gtggtaattctagagctaat acatgcttaa 180 aatctcgacc ctttggaaga gatgtattta ttagataaaaaatcaatgtc ttcgcactct 240 ttgatgattc ataataactt ttcgaatcgc atggccttgtgctggcgatg gttcattcaa 300 atttctgccc tatcaacttt cgatggtagg atagtggcctaccatggttt caacgggtaa 360 cggggaataa gggttcgatt ccggagaggg agcctgagaaacggctacca catccaagga 420 aggcagcagg cgcgcaaatt acccaatcct aattcagggaggtagtgaca ataaataacg 480 atacagggcc cattcgggtc ttgtaattgg aatgagtacaatgtaaatac cttaacgagg 540 aacaattgga gggcaagtct ggtgccagca gccgcggtaattccagctcc aatagcgtat 600 attaaagttg ttgcagttaa aaagctcgta gttgaactttgggcccggtt ggccggtccg 660 attttttcgt gtactggatt tccaacgggg cctttccttctggctaacct tgagtccttg 720 tggctcttgg cgaaccagga cttttacttt gaaaaaattagagtgttcaa agcaggcgta 780 ttgctcgaat atattagcat ggaataatag aataggacgtttggttctat tttgttggtt 840 tctaggacca tcgtaatgat taatagggac ggtcgggggcatcggtattc aattgtcgag 900 gtgaaattct tggatttatt gaagactaac tactgcgaaagcatttgcca aggacgtttt 960 cattaatcaa gaacgaaagt taggggatcg aagatgatctggtaccgtcg tagtcttaac 1020 cataaactat gccgactaga tcgggtggtg tttttttaatgacccactcg gtaccttacg 1080 agaaatcaaa gtctttgggt tctgggggga gtatggtcgcaaggctgaaa cttaaaggaa 1140 ttgacggaag ggcaccacta ggagtggagc ctgcggctaatttgactcaa cacggggaaa 1200 ctcaccaggt ccagacacaa taaggattga cagattgagagctctttctt gattttgtgg 1260 gtggtggtgc atggccgttt ctcagttggt ggagtgatttgtctgcttaa ttgcgataac 1320 gaacgagacc ttaacctact aaatagtggt gctagcatttgctggttatc cacttcttag 1380 agggactatc ggtttcaagc cgatggaagt ttgaggcaataacaggtctg tgatgccctt 1440 agaacgttct gggccgcacg cgcgctacac tgacggagccagcgagtcta accttggccg 1500 agaggtcttg gtaatcttgt gaaactccgt cgtgctggggatagagcatt gtaattattg 1560 ctcttcaacg aggaattcct agtaagcgca agtcatcagcttgcgttgat tacgtccctg 1620 ccctttgtac acaccgcccg tcgctagtac cgattgaatggcttagtgag gcctcaggat 1680 ctgcttagag aagggggcaa ctccatctca gagcggagaatttggacaaa cttggtcatt 1740 tagaggaact aaaagtcgta acaaggtttc cgtaggtgaacctgcggaag gatcatta 1798 111 1869 DNA Mus musculus 111 tacctggttgatcctgccag tagcatatgc ttgtctcaaa gattaagcca tgcatgtcta 60 agtacgcacggccggtacag tgaaactgcg aatggctcat taaatcagtt atggttcctt 120 tggtcgctcgctcctctcct acttggataa ctgtggtaat tctagagcta atacatgccg 180 acgggcgctgaccccccttc ccgggggggg atgcgtgcat ttatcagatc aaaaccaacc 240 cggtgagctccctcccggct ccggccgggg gtcgggcgcc ggcggcttgg tgactctaga 300 taacctcgggccgatcgcac gccccccgtg gcggcgacga cccattcgaa cgtctgccct 360 atcaactttcgatggtagtc gccgtgccta ccatggtgac cacgggtgac ggggaatcag 420 ggttcgattccggagaggga gcctgagaaa cggctaccac atccaaggaa ggcagcaggc 480 gcgcaaattacccactcccg acccggggag gtagtgacga aaaataacaa tacaggactc 540 tttcgaggccctgtaattgg aatgagtcca ctttaaatcc tttaacgagg atccattgga 600 gggcaagtctggtgccagca gccgcggtaa ttccagctcc aatagcgtat attaaagttg 660 ctgcagttaaaaagctcgta gttggatctt gggagcgggc gggcggtccg ccgcgaggcg 720 agtcaccgcccgtccccgcc ccttgcctct cggcgccccc tcgatgctct tagctgagtg 780 tcccgcggggcccgaagcgt ttactttgaa aaaattagag tgttcaaagc aggcccgagc 840 cgcctggataccgcagctag gaataatgga ataggaccgc ggttctattt tgttggtttt 900 cggaactgaggccatgatta agagggacgg ccgggggcat tcgtattgcg ccgctagagg 960 tgaaattcttggaccggcgc aagacggacc agagcgaaag catttgccaa gaatgttttc 1020 attaatcaagaacgaaagtc ggaggttcga agacgatcag ataccgtcgt agttccgacc 1080 ataaacgatgccgactggcg atgcggcggc gttattccca tgacccgccg ggcagcttcc 1140 gggaaaccaaagtctttggg ttccgggggg agtatggttg caaagctgaa acttaaagga 1200 attgacggaagggcaccacc aggagtgggc ctgcggctta atttgactca acacgggaaa 1260 cctcacccggcccggacacg gacaggattg acagattgat agctctttct cgattccgtg 1320 ggtggtggtgcatggccgtt cttagttggt ggagcgattt gtctggttaa ttccgataac 1380 gaacgagactctggcatgct aactagttac gcgacccccg agcggtcggc gtcccccaac 1440 ttcttagagggacaagtggc gttcagccac ccgagattga gcaataacag gtctgtgatg 1500 cccttagatgtccggggctg cacgcgcgct acactgactg gctcagcgtg tgcctaccct 1560 gcgccggcaggcgcgggtaa cccgttgaac cccattcgtg atggggatcg gggattgcaa 1620 ttattccccatgaacgagga attcccagta agtgcgggtc ataagcttgc gttgattaag 1680 tccctgccctttgtacacac cgcccgtcgc tactaccgat tggatggttt agtgaggccc 1740 tcggatcggccccgccgggg tcggcccacg gccctggcgg agcgctgaga agacggtcga 1800 acttgactatctagaggaag taaaagtcgt aacaaggttt ccgtaggtga acctgcggaa 1860 ggatcatta1869 112 1869 DNA Homo sapiens modified_base (27)..(27)m2a--2′-o-methyladenosine (genebank # 36162) 112 tacctggttg atcctgccagtagcatatgc ttgtctcaaa gattaagcca tgcatgtcta 60 agtacgcacg gccggtacagtgaaactgcg aatggctcat taaatcagtt atggttcctt 120 tggtcgctcg ctcctctcccacttggataa ctgtggtaat tctagagcta atacatgccg 180 acgggcgctg acccccttcgcgggggggat gcgtgcattt atcagatcaa aaccaacccg 240 gtcagcccct ctccggccccggccgggggg cgggcgccgg cggctttggt gactctagat 300 aacctcgggc cgatcgcacgccccccgtgg cggcgacgac ccattcgaac gtctgcccta 360 tcaactttcg atggtagtcgccgtgcctac catggtgacc acgggtgacg gggaatcagg 420 gttcgattcc ggagagggagcctgagaaac ggctaccaca tccaaggaag gcagcaggcg 480 cgcaaattac ccactcccgacccggggagg tagtgacgaa aaataacaat acaggactct 540 ttcgaggccc tgtaattggaatgagtccac tttaaatcct ttaacgagga tccattggag 600 ggcaagtctg gtgccagcagccgcggtaat tccagctcca atagcgtata ttaaagttgc 660 tgcagttaaa aagctcgtagttggatcttg ggagcgggcg ggcggtccgc cgcgaggcga 720 gccaccgccc gtccccgccccttgcctctc ggcgccccct cgatgctctt agctgagtgt 780 cccgcggggc ccgaagcgtttactttgaaa aaattagagt gttcaaagca ggcccgagcc 840 gcctggatac cgcagctaggaataatggaa taggaccgcg gttctatttt gttggttttc 900 ggaactgagg ccatgattaagagggacggc cgggggcatt cgtattgcgc cgctagaggt 960 gaaattcttg gaccggcgcaagacggacca gagcgaaagc atttgccaag aatgttttca 1020 ttaatcaaga acgaaagtcggaggttcgaa gacgatcaga taccgtcgta gttccgacca 1080 taaacgatgc cgaccggcgatgcggcggcg ttattcccat gacccgccgg gcagcttccg 1140 ggaaaccaaa gtctttgggttccgggggga gtatggttgc aaagctgaaa cttaaaggaa 1200 ttgacggaag ggcaccaccaggagtggagc ctgcggctta atttgactca acacgggaaa 1260 cctcacccgg cccggacacggacaggattg acagattgat agctctttct cgattccgtg 1320 ggtggtggtg catggccgttcttagttggt ggagcgattt gtctggttaa ttccgataac 1380 gaacgagact ctggcatgctaactagttac gcgacccccg agcggtcggc gtcccccaac 1440 ttcttagagg gacaagtggcgttcagccac ccgagattga gcaataacag gtctgtgatg 1500 cccttagatg tccggggctgcacgcgcgct acactgactg gctcagcgtg tgcctaccct 1560 acgccggcag gcgcgggtaacccgttgaac cccattcgtg atggggatcg gggattgcaa 1620 ttattcccca tgaacgaggaattcccagta agtgcgggtc ataagcttgc gttgattaag 1680 tccctgccct ttgtacacaccgcccgtcgc tactaccgat tggatggttt agtgaggccc 1740 tcggatcggc cccgccggggtcggcccacg gccctggcgg agcgctgaga agacggtcga 1800 acttgactat ctagaggaagtaaaagtcgt aacaaggttt ccgtaggtga acctgcggaa 1860 ggatcatta 1869

What is claimed is:
 1. A method of identifying an oligonucleotide havingtranscriptional or translational regulatory activity in a eukaryoticcell, the method comprising: a) integrating an oligonucleotide to beexamined for transcriptional or translational regulatory activity into aeukaryotic cell genome, wherein the oligonucleotide is operativelylinked to an expressible polynucleotide, and b) detecting a change inthe level of expression of the expressible polynucleotide in thepresence of the oligonucleotide as compared to the absence of theoligonucleotide, thereby identifying the oligonucleotide as havingtranscriptional or translational regulatory activity in a eukaryoticcell.
 2. The method of claim 1, wherein the expressible polynucleotidecomprises a cloning site, whereby the oligonucleotide is operativelylinked to the expressible polynucleotide by insertion into the cloningsite.
 3. The method of claim 2, wherein the cloning site comprises anucleotide sequence selected from a restriction endonuclease recognitionsite and recombinase recognition site.
 4. The method of claim 3, whereinthe nucleotide sequence comprises a multiple cloning site, whichcomprises a plurality of restriction endonuclease recognition sites. 5.The method of claim 3, wherein the cloning site is a recombinaserecognition site selected from a lox sequence and an att sequence. 6.The method of claim 1, wherein the expressible polynucleotide furthercomprises a transcription initiator sequence.
 7. The method of claim 1,wherein the expressible polynucleotide comprises a reporter polypeptide.8. The method of claim 7, wherein the reporter polypeptide is afluorescent polypeptide.
 9. The method of claim 8, wherein thefluorescent polypeptide is selected from the group consisting of greenfluorescent protein, cyan fluorescent protein, and red fluorescentprotein.
 10. The method of claim 8, wherein the fluorescent polypeptideis a modified fluorescent polypeptide, which exhibit enhancedfluorescence as compared to the fluorescent polypeptide.
 11. The methodof claim 7, wherein the reporter polypeptide is an antibiotic resistancepolypeptide.
 12. The method of claim 11, wherein the antibioticresistance protein is selected from puromycin N-acetyltransferase,hygromycin B phosphotransferase, neomycin (aminoglycoside)phosphotransferase, and the Sh ble gene product.
 13. The method of claim7, wherein the reporter molecule is a cell surface protein marker. 14.The method of claim 13, wherein the cell surface protein marker isneural cell adhesion molecule (N-CAM).
 15. The method of claim 7,wherein the reporter polypeptide is an enzyme.
 16. The method of claim15, wherein the enzyme is selected from β-galactosidase, chloramphenicolacetyltransferase, luciferase, and alkaline phosphatase.
 17. The methodof claim 1, wherein the oligonucleotide is an oligonucleotide to beexamined for transcriptional regulatory activity.
 18. The method ofclaim 17, wherein the expressible polynucleotide comprises anoperatively linked minimal promoter.
 19. The method of claim 18, whereinthe minimal promoter is selected from a TATA box, a minimal enkephalinpromoter, and a minimal SV40 early promoter.
 20. The method of claim 1,wherein the expressible polynucleotide comprises a dicistronic reportercassette comprising, in operative linkage, a regulatory cassettecomprising a minimal promoter and a cloning site, a first reportercassette, a spacer sequence comprising an internal ribosome entry site(IRES), and a second reporter cassette, whereby an oligonucleotide is anoligonucleotide to be examined for transcriptional regulatory activity,and whereby the oligonucleotide is operatively linked to the dicistronicreporter cassette by insertion into the cloning site.
 21. The method ofclaim 1, wherein the expressible polynucleotide is contained in avector.
 22. The method of claim 21, wherein the vector is selected fromSEQ ID NO: 2 and SEQ ID NO:
 3. 23. The method of claim 21, wherein thevector is a retroviral vector.
 24. The method of claim 23, wherein theretroviral vector is selected from SEQ ID NO: 1 and SEQ ID NO:
 9. 25.The method of claim 1, wherein the oligonucleotide is operatively linkedto the expressible polynucleotide prior to integrating into theeukaryotic cell genome.
 26. The method of claim 1, wherein theoligonucleotide is an oligonucleotide having translational regulatoryactivity.
 27. The method of claim 26, wherein the expressiblepolynucleotide comprises a promoter.
 28. The method of claim 27, whereinthe promoter is a strong promoter.
 29. The method of claim 28, whereinthe promoter is an RSV promoter.
 30. The method of claim 26, wherein theexpressible polynucleotide comprises a dicistronic reporter cassettecomprising, in operative linkage, a regulatory cassette comprising apromoter, a first reporter cassette, a spacer sequence comprising acloning site, and a second reporter cassette, whereby theoligonucleotide is operatively linked to the second cistron by insertioninto the cloning site.
 31. The method of claim 26, wherein theexpressible polynucleotide is contained in a vector.
 32. The method ofclaim 31, wherein the vector is a retroviral vector.
 33. The method ofclaim 32, wherein the retroviral vector has a nucleotide sequence as setforth in SEQ ID NO:
 109. 34. The method of claim 1, wherein theexpressible polynucleotide is an endogenous polynucleotide in theeukaryotic cell genome.
 35. The method of claim 34, wherein theoligonucleotide is operatively linked to the endogenous polypeptide byhomologous recombination.
 36. The method of claim 34, wherein theeukaryotic cell is a cell of a transgenic non-human eukaryote.
 37. Themethod of claim 36, wherein the eukaryotic cell comprises a transgenecomprising a recombinase recognition site, whereby integrating theoligonucleotide into eukaryotic cell genome comprises inserting theoligonucleotide into the recombinase recognition site.
 38. The method ofclaim 1, wherein the expressible polynucleotide comprises a transgene,which is stably maintained in the eukaryotic cell genome.
 39. The methodof claim 38, wherein the oligonucleotide is an oligonucleotide to beexamined for transcriptional regulatory activity, and wherein theexpressible polynucleotide comprises a dicistronic reporter cassettecomprising, in operative linkage, a regulatory cassette comprising aminimal promoter and a cloning site, a first reporter cassette, a spacersequence comprising an internal ribosome entry site (IRES), and a secondreporter cassette, whereby the oligonucleotide is operatively linked tothe dicistronic reporter cassette by insertion into the cloning site.40. The method of claim 38, wherein the oligonucleotide is anoligonucleotide to be examined for translational regulatory activity,and wherein the expressible polynucleotide comprises a dicistronicreporter cassette comprising, in operative linkage, a regulatorycassette comprising a promoter, a first reporter cassette, a spacersequence comprising a cloning site, and a second reporter cassette,whereby the oligonucleotide is operatively linked to the second cistronby insertion into the cloning site.
 41. The method of claim 1, whereinthe oligonucleotide to be examined for transcriptional or translationalregulatory activity is a synthetic oligonucleotide having a randomlygenerated nucleotide sequence.
 42. The method of claim 1, wherein theoligonucleotide to be examined for transcriptional or translationalregulatory activity is a variegated oligonucleotide.
 43. The method ofclaim 1, wherein the oligonucleotide to be examined for transcriptionalor translational regulatory activity is an oligonucleotide fragment ofgenomic DNA.
 44. The method of claim 1, wherein the oligonucleotide tobe examined for transcriptional or translational regulatory activity isan oligonucleotide to be examined for translational regulatory activity.45. The method of claim 44, wherein the oligonucleotide to be examinedfor translational regulatory activity is a cDNA portion of a 5′ UTR ofan mRNA.
 46. The method of claim 44, wherein the oligonucleotide to beexamined for translational regulatory activity is complementary to anoligonucleotide sequence of a ribosomal RNA (rRNA).
 47. The method ofclaim 46, wherein the rRNA is 18S rRNA.
 48. The method of claim 46,wherein the oligonucleotide to be examined for translational regulatoryactivity comprises a variegated population or oligonucleotides, each ofwhich is based on the oligonucleotide sequence of a rRNA.
 49. A methodof identifying an oligonucleotide having transcriptional or translationregulatory activity in a eukaryotic cell, the method comprising: a)cloning a library of oligonucleotides to be examined for transcriptionalor translation regulatory activity into multiple copies of an expressionvector comprising an expressible polynucleotide, whereby theoligonucleotides are operatively linked to the expressiblepolynucleotide, thereby obtaining a library of vectors; b) contactingthe library of vectors with eukaryotic cells under conditions such thatthe vectors are introduced into the cell and integrate into a chromosomein the cells; and c) detecting expression of an expressiblepolynucleotide operatively linked to an oligonucleotide at a level otherthan a level of expression of the expressible polynucleotide in theabsence of the oligonucleotide, thereby identifying an oligonucleotidehaving transcriptional or translational regulatory activity in aeukaryotic cell.
 50. The method of claim 49, wherein the oligonucleotideis an oligonucleotide to be examined for transcriptional regulatoryactivity, and wherein the expressible polynucleotide comprises, inoperative linkage, a regulatory cassette comprising a minimal promoterand a cloning site, and a reporter cassette, whereby the oligonucleotideis operatively linked to the expressible polynucleotide by insertioninto the cloning site.
 51. The method of claim 49, wherein theoligonucleotide is an oligonucleotide to be examined for transcriptionalregulatory activity, and wherein the expressible polynucleotidecomprises a dicistronic reporter cassette comprising, in operativelinkage, a regulatory cassette comprising a minimal promoter and acloning site, a first reporter cassette, a spacer sequence comprising aninternal ribosome entry site (IRES), and a second reporter cassette,whereby the oligonucleotide is operatively linked to the dicistronicreporter cassette by insertion into the cloning site.
 52. The method ofclaim 51, wherein the expressible polynucleotide is contained in avector.
 53. The method of claim 52, wherein the vector is selected fromSEQ ID NO: 2 and SEQ ID NO:
 3. 54. The method of claim 52, wherein thevector is a retroviral vector.
 55. The method of claim 54, wherein theretroviral vector is selected from SEQ ID NO:1 and SEQ ID NO:
 9. 56. Themethod of claim 49, wherein the library of oligonucleotides to beexamined for transcriptional or translation regulatory activity is alibrary of random oligonucleotides.
 57. The method of claim 49, whereinthe library of oligonucleotides to be examined for transcriptional ortranslation regulatory activity is a library of cDNA molecules, eachencoding a portion of a 5′ untranslated region of an mRNA.
 58. Themethod of claim 49, wherein the library of oligonucleotides to beexamined for transcriptional or translation regulatory activity is alibrary of genomic DNA fragments.
 59. The method of claim 49, whereinthe library of oligonucleotides to be examined for transcriptional ortranslation regulatory activity is a library of variegatedoligonucleotides, each of which is based on an oligonucleotide sequencecomplementary to an oligonucleotide sequence of a ribosomal RNA.
 60. Themethod of claim 49, further comprising selecting a population of cellsexpressing the expressible polynucleotide operatively linked to anoligonucleotide at a level other than a level of expression of theexpressible polynucleotide in the absence of the oligonucleotide. 61.The method of claim 60, further comprising isolating the operativelylinked oligonucleotide.
 62. An isolated transcriptional regulatoryelement obtained by the method of claim
 61. 63. A recombinant nucleicacid molecule comprising a plurality of operatively linked isolatedtranscriptional regulatory elements of claim
 62. 64. The recombinantnucleic acid molecule of claim 63, wherein the plurality comprises aplurality of different isolated transcriptional regulatory elements. 65.The method of claim 49, wherein the oligonucleotide is anoligonucleotide to be examined for translational regulatory activity,and wherein expressible polynucleotide comprises a dicistronic reportercassette comprising, in operative linkage, a regulatory cassettecomprising a promoter, a first reporter cassette, a spacer sequencecomprising a cloning site, and a second reporter cassette, whereby theoligonucleotide is operatively linked to the second cistron by insertioninto the cloning site.
 66. The method of claim 65, wherein theoligonucleotide to be examined for transcriptional or translationalregulatory activity is a synthetic oligonucleotide having a randomlygenerated nucleotide sequence.
 67. The method of claim 65, wherein theoligonucleotide to be examined for transcriptional or translationalregulatory activity is selected from a variegated oligonucleotide, acDNA portion of a 5′ UTR of an mRNA, and an oligonucleotide fragment ofgenomic DNA.
 68. The method of claim 49, wherein the expressiblepolynucleotide is contained in a vector.
 69. The method of claim 68,wherein the vector is a retroviral vector.
 70. The method of claim 69,wherein the retroviral vector has a nucleotide sequence as set forth inSEQ ID NO:
 109. 71. The method of claim 67, further comprising selectinga population of cells expressing the expressible polynucleotideoperatively linked to an oligonucleotide at a level other than a levelof expression of the expressible polynucleotide in the absence of theoligonucleotide.
 72. The method of claim 71, further comprisingisolating the operatively linked oligonucleotide.
 73. An isolatedtranslational regulatory element obtained by the method of claim
 72. 74.The translational regulatory element of claim 73, which is an IRESelement.
 75. A recombinant nucleic acid molecule comprising a pluralityof operatively linked isolated translational regulatory elements ofclaim
 74. 76. The recombinant nucleic acid molecule of claim 75, whereinthe plurality comprises a plurality of different isolated translationalregulatory elements.
 77. The method of claim 49, wherein the eukaryoticcell is a mammalian cell.
 78. The method of claim 77, wherein themammalian cell is a neuronal cell.
 79. The method of claim 49, whereinthe library of oligonucleotides comprises a library of randomizedoligonucleotides.
 80. An integrating expression vector, comprising, inoperative linkage in a 5′ to 3′ orientation, a long terminal repeat(LTR) containing a immediate early gene promoter, an R region, a U5region, a truncated gag gene comprising sequences required forretrovirus packaging, a dicistronic reporter cassette comprising a firstreporter cassette, a spacer sequence comprising an internal ribosomeentry site (IRES), a second reporter cassette, and a regulatory cassettecomprising a cloning site and a minimal promoter, and an LTR.
 81. Theintegrating expression vector of claim 80, wherein the first reportercassette and second reporter cassette each independently encode areporter polypeptide.
 82. The integrating expression vector of claim 81,wherein the reporter polypeptide is a fluorescent polypeptide.
 83. Theintegrating expression vector of claim 82, wherein the fluorescentpolypeptide is selected from the group consisting of green fluorescentprotein, cyan fluorescent protein, and red fluorescent protein.
 84. Theintegrating expression vector of claim 82, wherein the fluorescentpolypeptide is a modified fluorescent polypeptide, which exhibitenhanced fluorescence as compared to the fluorescent polypeptide. 85.The integrating expression vector of claim 81, wherein the reporterpolypeptide is an antibiotic resistance polypeptide.
 86. The integratingexpression vector of claim 85, wherein the antibiotic resistance proteinis selected from puromycin N-acetyltransferase, hygromycin Bphosphotransferase, neomycin (aminoglycoside) phosphotransferase, andthe Sh ble gene product.
 87. The integrating expression vector of claim81, wherein the reporter polypeptide is a cell surface protein marker.88. The integrating expression vector of claim 87, wherein the cellsurface protein marker is neural cell adhesion molecule (N-CAM).
 89. Theintegrating expression vector of claim 81, wherein the reporterpolypeptide is an enzyme.
 90. The integrating expression vector of claim89, wherein the enzyme is selected from β-galactosidase, chloramphenicolacetyltransferase, luciferase, and alkaline phosphatase.
 91. Theintegrating expression vector of claim 80, wherein the cloning sitecomprises a nucleotide sequence selected from a restriction endonucleaserecognition site and recombinase recognition site.
 92. The integratingexpression vector of claim 80, wherein the nucleotide sequence comprisesa multiple cloning site, which comprises a plurality of restrictionendonuclease recognition sites.
 93. The integrating expression vector ofclaim 80, wherein the cloning site is a recombinase recognition siteselected from a lox sequence and an att sequence.
 94. The integratingexpression vector of claim 80, wherein the minimal promoter is selectedfrom a TATA box, a minimal enkephalin promoter, and a minimal SV40 earlypromoter.
 95. The integrating expression vector of claim 80, which has anucleotide sequence selected from SEQ ID NO: 1 and SEQ ID NO:
 9. 96. Avector having a nucleotide sequence selected from SEQ ID NO: 2 and SEQID NO:
 3. 97. An integrating expression vector, comprising, in operativelinkage in a 5′ to 3′ orientation, a long terminal repeat (LTR)containing a immediate early gene promoter, an R region, a U5 region, atruncated gag gene comprising sequences required for retroviruspackaging, a dicistronic reporter cassette comprising a first reportercassette, a spacer sequence comprising a cloning site, a second reportercassette, and a regulatory cassette comprising a promoter, and an LTR.98. The integrating expression vector of claim 97, wherein the firstreporter cassette and second reporter cassette each independently encodea reporter polypeptide.
 99. The integrating expression vector of claim98, wherein the reporter polypeptide independently is selected from afluorescent polypeptide, an antibiotic resistance polypeptide, a cellsurface protein marker, an enzyme, and a peptide tag.
 100. Theintegrating expression vector of claim 99, wherein the reporterpolypeptide of the first reporter cassette is puromycinN-acetyltransferase and wherein the reporter polypeptide of the secondreporter cassette is enhanced green fluorescent protein.
 101. Theintegrating expression vector of claim 99, wherein the reporterpolypeptide of the first reporter cassette is puromycinN-acetyltransferase and wherein the reporter polypeptide of the secondreporter cassette is N-CAM.
 102. The integrating expression vector ofclaim 98, wherein the cloning site comprises a nucleotide sequenceselected from a restriction endonuclease recognition site andrecombinase recognition site.
 103. The integrating expression vector ofclaim 98, which has a nucleotide sequence as set forth in SEQ ID NO:109.
 104. A kit, comprising an integrating expression vector of claim80.
 105. A kit, comprising the integrating expression vector of claim97.
 106. A kit, comprising an isolated synthetic transcriptional ortranslational regulatory oligonucleotide identified by the method ofclaim
 1. 107. The kit of claim 106, further comprising a vector forcontaining the oligonucleotide.
 108. The kit of claim 106, comprising aplurality of isolated synthetic transcriptional or translationalregulatory oligonucleotides.
 109. An isolated transcriptional regulatoryelement selected from any of SEQ ID NOS: 10, 11, 13, 15 and 15.